期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:2
页码:1762
DOI:10.15680/IJIRCCE.2017.0502098
出版社:S&S Publications
摘要:One of the serious problems faced in several applications with personal details management, customeraffiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate recorddetection techniques in both small and large datasets. To detect the duplicity with less time of execution and alsowithout disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used.Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting theduplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplicationrequires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can bedoubled over the conventional duplicate detection method using this algorithm. Several different methods of dataanalysis are studied here with various approaches for duplicate detection.