期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:2
页码:2297-2301
出版社:TechScience Publications
摘要:Data cleansing is an essential part of data mining and has become a pre-requirement before analysing any kind of data. The data collected by an organisation is enormous and full of errors and inconsistencies, which degrades the quality of data and affects the results of mining. Many algorithms have been proposed by several authors to deal with such inconsistencies. But, a little work has been done on the date type field. Being an integral part of any data we need to ensure that the date field associated with a database is consistent in all aspects. This paper addresses the various problems related with date type fields and different types of errors that can occur due to different date formats. We propose an algorithm DFT for the transformation of varying date formats into a unique consistent format to avoid any ambiguities. The data set for implementation of the algorithm is taken from the causelists of Supreme Court of India. The algorithm shows good results and transforms each date record into a unified format to avoid noise in the database
关键词:Data Cleaning; Normalisation; Inconsistent Date;Formats; Disguised Date values