首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Enhanced Technique for Data Cleaning in Text File
  • 本地全文:下载
  • 作者:Arup Kumar Bhattacharjee ; Atanu Mallick ; Arnab Dey
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2013
  • 卷号:10
  • 期号:5
  • 出版社:IJCSI Press
  • 摘要:Data cleaning is a process of correcting or removing of erroneous data caused by contradictions, disparities, keying mistakes, missing bits, etc to create consistent and reliable information. Text files are used to store simple information and which can be also deceptive in terms of dirty data. In this paper we have provided a solution to cleanup simple text file using some data cleaning processes. Though we use text files so often but there is no such robust method exist to clean up text files. As data cleaning plays a crucial role for decision management which is depend on high quality data. So we have implemented a set of methods to clean text files. Here we use text files to store data in tabular format and our system checks whether there exist any error and finally try to correct or remove the errors according to different algorithms.
  • 关键词:ETL; Data Dictionary; Metaphone; Date Validation Rules; Gender Validation Rules.
国家哲学社会科学文献中心版权所有