期刊名称:International Journal of Reviews in Computing
印刷版ISSN:2076-3328
电子版ISSN:2076-3336
出版年度:2012
卷号:12
页码:59-67
出版社:Little Lion Scientific Research and Developement
摘要:Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and computational cost can still be very expensive. In addition, single processor’s memory and CPU resources are very limited, which make the algorithm performance inefficient. Parallel and distributed computing are effective strategies for accelerating algorithms performance. In this paper, we have implemented an efficient MapReduce Apriori algorithm (MRApriori) based on Hadoop-MapReduce model which needs only two phases (MapReduce Jobs) to find all frequent k-itemsets, and compared our proposed MRApriori algorithm with current two existed algorithms which need either one or k phases (k is maximum length of frequent itemsets) to find the same frequent k-itemsets. Experimental results showed that the proposed MRApriori algorithm outperforms the other two algorithms.
关键词:Hadoop; MapReduce; Parallel Computing; Distributed Computing; Apriori Algorithm; Frequent Itemset; Data Mining; Association Rule