首页    期刊浏览 2024年09月01日 星期日
登录注册

文章基本信息

  • 标题:A review on methods of Duplicate Detection
  • 本地全文:下载
  • 作者:Ms. Laxmi R Adhav ; Ms. Monali A. Gurule
  • 期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
  • 印刷版ISSN:2278-1323
  • 出版年度:2017
  • 卷号:6
  • 期号:5
  • 页码:610-613
  • 出版社:Shri Pannalal Research Institute of Technolgy
  • 摘要:Duplicate detection is major task in data processing and cleaning. In this paper we discussed about various methods of duplicate detection for a given dataset. Calculating edit distance is the most preferred approach for duplicate detection. Various methods like EdJoin , Winnowing are based on calculating edit distance. Strings could be divided into number of small substrings known as Grams. VGRAM algorithm uses this gram based approach. While calculating edit distance strings are divided into number of small strings called Chunks. VChunkJoin algorithm uses this chunking scheme. Comparison is made based on results for best duplicate detection of records.
  • 关键词:CDB; Chunk; Edit Distance; Gram; Virtual CDB.
国家哲学社会科学文献中心版权所有