摘要:Software prefetching and locality optimizations are techniques for overcoming the speed gapbetween pro cessor and memory. In this pap er, we provide a comprehensive summary of currentsoftware prefetching and locality optimization techniques, and evaluate the impact of memory trendson the e.ectiveness of these techniques for three types of applications: regular scientific codes,irregular scientific co des, and pointer-chasing codes. We find that for many applications, softwareprefetching outperforms locality optimizations when there is su.cient memory bandwidth, butlocality optimizations outperform software prefetching under bandwidth-limited conditions. Thebreak-even point (for 1 GHz processors) occurs at roughly 2.26 GBytes/sec on to day's memorysystems, and will increase on future memory systems. We also study the interactions betweensoftware prefetching and lo cality optimizations when applied in concert. Naively combining thetechniques provides robustness to changes in memory bandwidth and latency, but does not yieldadditional performance gains. We propose and evaluate several algorithms to better integratesoftware prefetching and locality optimizations, including a modified tiling algorithm, padding forprefetching, and index prefetching. Finally, we investigate the interactions of stride-based hardwareprefetching with our software techniques. We find that combining hardware and software prefetchingyields similar performance to software prefetching alone, and that locality optimizations enablestride-based hardware prefetching for benchmarks that do not normally exhibit striding