期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:4
页码:8271
DOI:10.15680/IJIRCCE.2017.05040219
出版社:S&S Publications
摘要:As the data size increases globally day to day on web world, the utilization and enhancement of dataretrieval and persistence is required. The current parallel mining techniques involves the clustering and frequentitemsets mining which critically lacks in automatic parallelization, load balancing, data distribution, and fault toleranceon large clusters. The existing system works on top of FiDoop framework, which possesses the latency and lowthreshold on handling large clusters. In order to ensure the automatic parallelization and dynamic shrading andautomatic instance, the new framework for frequent itemset mining FIMN is built on top of Neo4j graph databaseapplying the Random forest algorithm which undergoes the several classifications of data uploaded. In Neo4j,MapReduce jobs are implemented to complete the mining task. In the MapReduce work, the mappers freely decayitemsets; the reducers perform mix operations by developing little ultra metric trees, and the real mining of these treesindependently. A survey on this FIMN framework on Neo4j includes it will has high scalability and efficiency onapplication servers and there is no latency issues.
关键词:FIMN; frequent itemsets; MapReduce; Random Forest; Neo4j