期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:3
页码:3964
DOI:10.15680/IJIRCCE.2017.0503047
出版社:S&S Publications
摘要:Data duplicate detection is the process of identifying multiple representations of same or real worldentities. Nowadays, data duplicate detection methods are needed to process larger datasets in shorter time: maintainingthe quality of the datasets and also the entities duplicated becomes increasingly difficult. This application focus on theduplicates in hierarchical data’s like XML file. The data can be detected using the detection methods. Here the datasetsare loaded in the applications and the processing, extraction, cleaning, separation and detection are carried out toremove the duplicated data. Comprehensive experiments show that our progressive algorithms can double theefficiency over time of traditional duplicate detection and significantly improve upon related work.
关键词:Duplicate detection; entity resolution; progressiveness; and data cleaning