期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2016
卷号:7
期号:3
页码:1549-1551
出版社:TechScience Publications
摘要:The presence of duplicate records is a major dataquality concern in large databases. To detect duplicates,entity resolution also known as duplication detection orrecord linkage is used as a part of the data cleaning processto identify records that potentially refer to the same realworldentity. So the existing systems, progressive duplicatedetection method identifies most duplicate pairs early in thedetection process with lesser time and data count strategymultirecord increase (dcs++) method identifies morenumber of duplicates but takes more time. So we propose asystem which have characteristics of both as a combination.So that this proposed system is less time consuming methodwith more accurate results as compared to the previous orexisting algorithms.
关键词:Duplicate detection; windowing; Blocking; pay-asyou-;go; progressiveness; data cleaning; dcs++.