首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:An Improved Classification Course Based on Mapreduce
  • 本地全文:下载
  • 作者:Haitao Wang ; Shufeng Liu ; Zongpu Jia
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2015
  • 卷号:8
  • 期号:3
  • 页码:43-52
  • DOI:10.14257/ijgdc.2015.8.3.05
  • 出版社:SERSC
  • 摘要:It is an importance step for near-duplication detection to perform file classification in the data mining field, in this paper an improved classification course is proposed which consists of training and test course corresponding to its algorithm respectively. It utilizes the MapReduce computing model created by Google to conduct the classification calculation. Specially, the Sogou news data with various data amounts which simulated the massive data set was used for testing effectiveness and a comparative evaluation on execution time and speedup was accomplished on the experimental circumstance. The results obtained shows that the classification course obviously reduces the execution times greatly and gains the ideal speedup ratio when increasing data amounts, achieves the better performance.
  • 关键词:Classification; Na.ve Byes; Algorithm; MapReduce; Massive Data
国家哲学社会科学文献中心版权所有