期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2014
卷号:9
期号:2
页码:199-212
出版社:SERSC
摘要:Processing very large graphs efficiently is a challenging task. Distributed graph processing systems process the billion-scale graphs efficiently but incur overheads of partitioning and distribution of the large graph over a cluster of nodes. In order to overcome these problems a disk-based engine, GraphChi was proposed recently that processes the graph in chunks on a single PC. GraphChi significantly outperformed all the representative distributed processing frameworks. Still, we observe that GraphChi incurs some serious degradation in performance due to 1) high number of non-sequential I/Os for processing every chunk of the graph; and 2) limited parallelism to process the graph. In this paper, we propose a novel engine named BiShard Parallel Processor (BSPP) to efficiently process billions-scale graphs on a single PC. We introduce a new storage structure BiShard. BiShard divides the large graph into subgraphs and maintains the in and out edges separately. This storage mechanism significantly reduces the number of non-sequential I/Os. We implement a new processing model named BiShard Parallel (BSP) on top of Bishard. BSP exploits the properties of Bishard to enable full CPU parallelism for processing the graph. Our experiments on real large graphs show that our solution significantly outperforms GraphChi