摘要:This paper provides a quantitative evaluation of load latency tolerance in a dynamicallyscheduled processor. To determine the latency tolerance of each memory load operation,our simulations use flexible load completion policies instead of a fixed memory hierarchythat dictates the latency. Although our policies delay load completion as long as possible,they produce performance (instructions committed per cycle (IPC)) comparable to a pro-cessor with an ideal memory system where all loads complete in one cycle. Our simulationsreveal that to produce IPC values within 12% of a processor with an ideal memory system,between 1% and 71% of loads need to be satisfied within a single cycle and that up to 74%can be satisfied in as many as 32 cycles, depending on the benchmark and processor con-figuration. Load latency tolerance is largely determined by whether a mispredicted branchis in the load's data dependence graph and the depth of the dependence graph. Our resultsshow that up to 36% of all loads miss in the level one cache yet have latency demands lowerthan second level cache access times. We also show that a similar percentage of loads hitin the level one cache even though they possess enough latency tolerance to be satisfied bylower levels of the memory hierarchy.