摘要:Hardware data prefetching is widely adopted to hide long memory latency. A hardwaredata prefetcher predicts the memory address that will be accessed in the near future andfetches the data at the predicted address into the cache memory in advance. To detectmemory access patterns such as a constant stride, most existing prefetchers use di.erencesbetween addresses in a sequence of memory accesses. However, prefetching based on thedi.erences often fail to detect memory access patterns when aggressive optimizations areapplied. For example, out-of-order execution changes the memory access order. It causesinaccurate prediction because the sequence of memory addresses used to calculate thedi.erence are changed by the optimization.To overcome the problems of existing prefetchers, we propose Access Map PatternMatching (AMPM). The AMPM prefetcher has two key components: a memory accessmap and hardware pattern matching logic. The memory access map is a bitmap-like datastructure for holding past memory accesses. The AMPM prefetcher divides the memoryaddress space into memory regions of a fixed size. The memory access map is mapped to thememory region. Each entry in the bitmap-like data structure is mapped to each cache linein the region. Once the bitmap is mapped to the memory region, the entry records whetherthe corresponding line has already been accessed or not. The AMPM prefetcher detectsmemory access patterns from the bitmap-like data structure that is mapped to the accessedregion. The hardware pattern matching logic is used to detect stride access patterns in thememory access map. The result of pattern matching is a.ected by neither the memoryaccess order nor the instruction addresses because the bitmap-like data structure holdsneither the information that reveals the memory access order of past memory accesses northe instruction addresses. Therefore, the AMPM prefetcher achieves high performance evenwhen such aggressive optimizations are applied.The AMPM prefetcher is evaluated by performing cycle-accurate simulations using thememory-intensive benchmarks in the SPEC CPU2006 and the NAS Parallel Benchmark.In an aggressively optimized environment, the AMPM prefetcher improves prefetch cover-age, while the other state-of-the-art prefetcher degrades the prefetch coverage significantly.As a result, the AMPM prefetcher increases IPC by 32.4% compared to state-of-the-artprefetcher