首页    期刊浏览 2024年07月18日 星期四
登录注册

文章基本信息

  • 标题:Dynamic Parallelization and Vectorization of Binary Executables on Hierarchical Platforms
  • 本地全文:下载
  • 作者:Efe Yardimci ; Michael Franz
  • 期刊名称:The Journal of Instruction-Level Parallelism
  • 电子版ISSN:1942-9525
  • 出版年度:2008
  • 卷号:10
  • 页码:1-24
  • 出版社:International Symposium on Microarchitecture
  • 摘要:As performance improvements are being increasingly sought via coarse-grained par-allelism, established expectations of continued sequential performance increases are notbeing met. Current trends in computing point toward platforms seeking performance im-provements through various degrees of parallelism, with coarse-grained parallelism featuresbecoming commonplace in even entry-level systems.Yet the broad variety of multiprocessor configurations that will be available that di.erin the numb er of pro cessing elements will make it di.cult to statically create a singleparallel version of a program that performs well on the whole range of such hardware. Asa result, there will so on be a vast number of multipro cessor systems that are significantlyunder-utilized for lack of software that harnesses their power e.ectively. This problem isexacerbated by the growing inventory of legacy programs in binary executable form withpossibly unreachable source code.We present a system that improves the p erformance of optimized sequential binariesthrough dynamic recompilation. Leveraging observations made at runtime, a thin soft-ware layer recompiles executing code compiled for a unipro cessor and generates paral-lelized and/or vectorized code segments that exploit available parallel resources. Amongthe techniques employed are control speculation, lo op distribution across several threads,and automatic parallelization of recursive routines.Our solution is entirely software-based and can be ported to existing hardware platformsthat have parallel processing capabilities. Our p erformance results are obtained on realhardware without using simulation.In preliminary benchmarks on only modestly parallel (2-way) hardware, our system al-ready provides speedups of up to 40% on SpecCPU benchmarks, and near-optimal sp eedupson more obviously parallelizable benchmarks
国家哲学社会科学文献中心版权所有