期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2013
卷号:10
期号:5
出版社:IJCSI Press
摘要:Data cleaning is a process of correcting or removing of erroneous data caused by contradictions, disparities, keying mistakes, missing bits, etc to create consistent and reliable information. Text files are used to store simple information and which can be also deceptive in terms of dirty data. In this paper we have provided a solution to cleanup simple text file using some data cleaning processes. Though we use text files so often but there is no such robust method exist to clean up text files. As data cleaning plays a crucial role for decision management which is depend on high quality data. So we have implemented a set of methods to clean text files. Here we use text files to store data in tabular format and our system checks whether there exist any error and finally try to correct or remove the errors according to different algorithms.
关键词:ETL; Data Dictionary; Metaphone; Date Validation Rules; Gender Validation Rules.