首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:The Design and Implementation of a Cleaning System Prototype
  • 本地全文:下载
  • 作者:He Yang ; Weiwei Liu ; Xiaohui Wang
  • 期刊名称:IOP Conference Series: Earth and Environmental Science
  • 印刷版ISSN:1755-1307
  • 电子版ISSN:1755-1315
  • 出版年度:2019
  • 卷号:252
  • 期号:3
  • 页码:1-8
  • DOI:10.1088/1755-1315/252/3/032218
  • 出版社:IOP Publishing
  • 摘要:As we all know, data is one of the most valuable assets, however, raw data is often problematic, not conducive to the training of algorithm models. To cope with this, we can process the dirty data with cleaning systems [1] to obtain standard clean data for data statistics, data mininig and other use. Instead of manually modifying data, writing SQLs or other cumbersome methods which are popular present ways to clean data, the article proposes an approach by making use of the Hadoop big data platform to support massive data and support the cleaning of multiple heterogeneous data sources. Moreover, our system prototype supports custom rules and algorithms, can export results to a specified database, greatly simplifying the workload of data cleaning personnel. Based on the system design and theoretical verification presented in this paper, the author implemented a big data cleaning tool based on big data platform. The typical data cleaning process shows that the data cleaning can be achieved and user operations can be simplified on the basis of the theory proposed in this paper.
国家哲学社会科学文献中心版权所有