期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2016
卷号:7
期号:2
页码:513-516
出版社:TechScience Publications
摘要:Although there is a long line of work on identifyingreplicates in relational data, only a couple of answers aim onduplicate detection in more convoluted hierarchical structureslike XML facts and figures. In this paper, we present aninnovative method for XML duplicate detection, calledXMLDup. XMLDup benefits a Bayesian network to work outthe likelihood of two XML elements being replicates,considering not only the data within the components, butfurthermore the way that data is structured. In supplement, toimprove the effectiveness of the network evaluation, aninnovative pruning scheme, adept of important gains over theoptimized version of the algorithm, is offered. Through trials,we display that our algorithm is adept to achieve highprecision and recall tallies in some data groups. XMLDup isalso able to outperform another state-of-the-art replicatedetection solution, both in terms of effectiveness and ofeffectiveness.