文章基本信息

标题：Design of SPRINT Parallelization of Data Mining Algorithms Based on Cloud Computing
本地全文：下载
作者：Lei Song ; Huajie Zhang ; Dongdong Feng 等
期刊名称：Engineering Letters
印刷版ISSN：1816-093X
电子版ISSN：1816-0948
出版年度：2022
卷号：30
期号：2
页码：399-405
语种：English
出版社：Newswood Ltd
摘要：For traditional data mining, all data shall be loaded into memory for analysis and calculation. It belongs to a stand-alone computing mode, which has low calculation efficiency, and a high mining failure rate during the work process. As the data storage and computer technology develop rapidly, how to store and process big data effectively has become an important problem to be solved. Cloud computing can quickly obtain resources from the computing resource pool, and implement parallel improvement of data mining algorithms, which can achieve an efficient combination of cloud computing platform and data mining, and effectively make up for the bottlenecks faced by traditional data mining processes. Therefore, based on the Hadoop cloud computing platform, this paper makes full use of the characteristics of the MapReduce programming framework, and proposes a parallel design of decision tree nodes, node attribute metrics, and Gini index ranking for the SPRINT decision tree algorithm. The performance of the parallelized SPRINT algorithm on classification accuracy, scalability, and speedup ratio is tested. The results indicate that the parallel design of the SPRINT algorithm can obtain good scalability and parallel speedup under the premise of ensuring classification accuracy, which verifies the feasibility of the parallel design of data mining algorithms on the basis of cloud computing.
关键词：data mining;cloud computing;SPRINT algorithm;parallel design