首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:A Hybrid Sorting Algorithm on Heterogeneous Architectures
  • 本地全文:下载
  • 作者:Ming Xu ; Xianbin Xu ; Fang Zheng
  • 期刊名称:TELKOMNIKA (Telecommunication Computing Electronics and Control)
  • 印刷版ISSN:2302-9293
  • 出版年度:2015
  • 卷号:13
  • 期号:4
  • 页码:1399-1407
  • DOI:10.12928/telkomnika.v13i4.1896
  • 语种:English
  • 出版社:Universitas Ahmad Dahlan
  • 摘要:Nowadays high performance computing devices are more common than ever before. The capacity of main memories becomes very huge, CPUs get more cores and computing units that have greater performance. There are more and more machines get accelerators such as GPUs, too. Take full advantages of modern machines that use heterogeneous architectures to get higher performance solutions is a real challenge. There are so much literatures on only use CPUs or GPUs, however, research on algorithms that utilize heterogeneous architectures is comparatively few. In this paper, we propose a novel hybrid sorting algorithm that let CPU cooperate with GPU. To fully utilize computing capability of both CPU and GPU, we used SIMD intrinsic instructions to implement sorting kernels that run on CPU, and adopted radix sort kernels that implemented by CUDA(Compute Unified Device Architecture) that run on GPU. Performance evaluation is promising that our algorithm can sort one billion 32-bit float data in no more than 5 seconds.
  • 其他摘要:Nowadays high performance computing devices are more common than ever before. The capacity of main memories becomes very huge, CPUs get more cores and computing units that have greater performance. There are more and more machines get accelerators such as GPUs, too. Take full advantages of modern machines that use heterogeneous architectures to get higher performance solutions is a real challenge. There are so much literatures on only use CPUs or GPUs, however, research on algorithms that utilize heterogeneous architectures is comparatively few. In this paper, we propose a novel hybrid sorting algorithm that let CPU cooperate with GPU. To fully utilize computing capability of both CPU and GPU, we used SIMD intrinsic instructions to implement sorting kernels that run on CPU, and adopted radix sort kernels that implemented by CUDA(Compute Unified Device Architecture) that run on GPU. Performance evaluation is promising that our algorithm can sort one billion 32-bit float data in no more than 5 seconds.
国家哲学社会科学文献中心版权所有