首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Yet another Hybrid Strategy for Auto-tuning SpMV on GPUs
  • 本地全文:下载
  • 作者:Zhaohui Wang ; Xiaojie Qiu ; Aimin Zhang
  • 期刊名称:International Journal of Software Engineering and Its Applications
  • 印刷版ISSN:1738-9984
  • 出版年度:2015
  • 卷号:9
  • 期号:5
  • 页码:97-106
  • DOI:10.14257/ijseia.2015.9.5.10
  • 出版社:SERSC
  • 摘要:Sparse matrix-vector multiplication (SpMV) is a key linear algebra algorithm and is widely used in many application domains. Besides multi-core architecture, there is also extensive research focusing on accelerating SpMV on many-core Graphics Processing Units (GPUs). SpMV computations have many indirect and irregular memory accesses, and load imbalance could occur while mapping computations onto single-instruction, multiple-data (SIMD) GPUs. SpMV is highly memory bandwidth-bound, though GPUs have massive computational resources, the performance of SpMV on GPUs is still unsatisfying. In this paper, we present a new hybrid strategy for auto-tuning SpMV on GPUs. Our strategy combines the advantages of row-major storage and column-major storage. Like many other strategies, we reordered a given sparse matrix according to row lengths in decreasing order. In order to be more adaptive and efficient, we proposed a new hybrid Blocked CSR and JDS (BCJ) format based on original CSR and JDS. BCJ splits a sparse matrix into a denser part and a sparser part after reordering and uses different kernels to process the corresponding part. And we proposed corresponding auto-tuning framework to help transforming matrix and launching kernels according to the sparsity characteristics of the matrix. A CUDA implementation of BCJ outperforms the original formats significantly on a broad range of unstructured sparse matrices.
  • 关键词:SpMV; GPU; auto-tuning; parallel programming; CSR; JDS; BCJ; CUDA
国家哲学社会科学文献中心版权所有