期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2016
卷号:7
期号:6
页码:2491-2498
出版社:TechScience Publications
摘要:Existing mining algorithms for frequent itemsetslack a mechanism that enables automatic parallelization, loadbalancing, data distribution, and fault tolerance on largeclusters. As a solution to this problem, we design a two stepalgorithm method for mining of frequent itsemsets using theMapReduce programming model. To achieve minimum runningtime for the corresponding minimum support, FP growthalgorithm is used. This paper incorporates the mining offrequent items using FP trees. Also three MapReduce jobs areimplemented to complete the mining task. In the crucial thirdMapReduce job, the mappers independently decomposeitemsets, the reducers perform combination operations. Tooptimize the mining process and to measure load balance acrossthe cluster’s computing nodes FiDoop-HD, an extension ofFiDoop is used, to speed up the mining performance for highdimensionaldata analysis. Extensive experiments using realworldcelestial spectral data demonstrate that our proposedsolution is efficient and scalable