首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Design of SPRINT Parallelization of Data Mining Algorithms Based on Cloud Computing
  • 本地全文:下载
  • 作者:Lei Song ; Huajie Zhang ; Dongdong Feng
  • 期刊名称:Engineering Letters
  • 印刷版ISSN:1816-093X
  • 电子版ISSN:1816-0948
  • 出版年度:2022
  • 卷号:30
  • 期号:2
  • 页码:399-405
  • 语种:English
  • 出版社:Newswood Ltd
  • 摘要:For traditional data mining, all data shall be loaded into memory for analysis and calculation. It belongs to a stand-alone computing mode, which has low calculation efficiency, and a high mining failure rate during the work process. As the data storage and computer technology develop rapidly, how to store and process big data effectively has become an important problem to be solved. Cloud computing can quickly obtain resources from the computing resource pool, and implement parallel improvement of data mining algorithms, which can achieve an efficient combination of cloud computing platform and data mining, and effectively make up for the bottlenecks faced by traditional data mining processes. Therefore, based on the Hadoop cloud computing platform, this paper makes full use of the characteristics of the MapReduce programming framework, and proposes a parallel design of decision tree nodes, node attribute metrics, and Gini index ranking for the SPRINT decision tree algorithm. The performance of the parallelized SPRINT algorithm on classification accuracy, scalability, and speedup ratio is tested. The results indicate that the parallel design of the SPRINT algorithm can obtain good scalability and parallel speedup under the premise of ensuring classification accuracy, which verifies the feasibility of the parallel design of data mining algorithms on the basis of cloud computing.
  • 关键词:data mining;cloud computing;SPRINT algorithm;parallel design
国家哲学社会科学文献中心版权所有