期刊名称:International Journal of Advanced Computer Research
印刷版ISSN:2249-7277
电子版ISSN:2277-7970
出版年度:2020
卷号:10
期号:48
页码:116-127
DOI:10.19101/IJACR.2020.1048023
出版社:Association of Computer Communication Education for National Triumph (ACCENT)
摘要:In recent years data is growing continuously at an exponential rate. Even in its processed form, large data is difficult to understand and analyze. One of the best approaches to handle large data is to represent it in the graphic format. These graphs can be used for data analysis, processing and decision making. Because of the volume of data is huge, the graphs generated by them are also huge making it difficult to process. Thus, the modular approach of studying them by partitioning them is much more effective. There are several inbuilt partitioning techniques available in Apache Spark which can be extended to graph partitioning according to the user needs. Graph partitioning is an active area of research with considerable activity with an aim towards increasing the accuracy and speed of the algorithms. This research work aims to build a high- performance graph partitioning technique for Apache Spark which makes it faster, scalable and efficient. In this paper, a custom algorithm that can divide the graph into an optimal number of partitions dynamically in Apache Spark is proposed. A novel method to calculate the distance between the nodes and similarity indexes to partition the data is introduced. The optimal number of partitions is decided to use similarity indexes, which are calculated using the concept of Laplacian matrix and Eigenvalues. The proposed algorithm is implemented and its performance is compared to the existing algorithms in Apache Spark. The results indicate that the algorithm partitions graphs which have a huge number of vertices with considerable efficiency and computing cost.