期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2014
卷号:7
期号:2
DOI:10.14445/22312803/IJCTT-V7P105
出版社:Seventh Sense Research Group
摘要:Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. It exactly identifies whether the given data is duplicates or not. Real world duplicates are multiple representations that related to same real world data object. Detection of duplicates can performed to any places, it takes major important in database. To perform this hierarchical structure of duplicate detection in single relation is applied to XML data .In this work existing system presents a method called XMLDup. XMLDup uses a Bayesian network to establish the conditional probability between two XML elements being duplicates, taking into consideration not. Bayesian network based system conditional probability values are derived manually it becomes less efficient when compare to machine learning based results improves the efficiency of the duplicate detection proposed system finds the duplicate detection of XML data and XML Objects with different structure representation of the input files. Derive the conditional probability by applying Support vector machines (SVMs) models with associated learning algorithms that analyze XML Duplicate data object. In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. Finally proposed SVM based classification performs better and efficient as well as effective duplicate detection.
关键词:Duplicate detection; XML data; SVM classification; Bayesian networks and entity resolution.