首页    期刊浏览 2025年02月17日 星期一
登录注册

文章基本信息

  • 标题:DFT: A Novel Algorithm for Data Cleansing
  • 本地全文:下载
  • 作者:Shweta Taneja ; Ishita Ashri ; Shipra Gupta
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2014
  • 卷号:5
  • 期号:2
  • 页码:2297-2301
  • 出版社:TechScience Publications
  • 摘要:Data cleansing is an essential part of data mining and has become a pre-requirement before analysing any kind of data. The data collected by an organisation is enormous and full of errors and inconsistencies, which degrades the quality of data and affects the results of mining. Many algorithms have been proposed by several authors to deal with such inconsistencies. But, a little work has been done on the date type field. Being an integral part of any data we need to ensure that the date field associated with a database is consistent in all aspects. This paper addresses the various problems related with date type fields and different types of errors that can occur due to different date formats. We propose an algorithm DFT for the transformation of varying date formats into a unique consistent format to avoid any ambiguities. The data set for implementation of the algorithm is taken from the causelists of Supreme Court of India. The algorithm shows good results and transforms each date record into a unified format to avoid noise in the database
  • 关键词:Data Cleaning; Normalisation; Inconsistent Date;Formats; Disguised Date values
国家哲学社会科学文献中心版权所有