首页    期刊浏览 2024年11月13日 星期三
登录注册

文章基本信息

  • 标题:Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm
  • 本地全文:下载
  • 作者:Dr.M.Mayilvaganan ; M.Saipriyanka
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2015
  • 卷号:3
  • 期号:9
  • DOI:10.15680/IJIRCCE.2015. 0309080
  • 出版社:S&S Publications
  • 摘要:With the ever increasing volume of data, data quality problems abound. Multiple, yet differentrepresentations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems.The effects of such duplicates are detrimental. For instance, bank customers can obtain duplicate identities, inventorylevels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc. Automatically detectingduplicates is difficult. Duplicate detection is the process for identifying multiple representations of same real worldentities. Nowadays, duplicate detection methods need to process ever larger datasets in ever shorter time: maintainingthe quality of a dataset becomes increasingly difficult. Genetic algorithm is proposed that significantly increase theefficiency of finding duplicates if the execution time is limited. This efficiently detects the text document duplicationwhich has same content with distinct file name or different content with same file name.
  • 关键词:Clustering Algorithm; Genetic Algorithm; progressive SNM; progressive blocks.
国家哲学社会科学文献中心版权所有