期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2013
卷号:4
期号:10-1
出版社:Seventh Sense Research Group
摘要:In data integration Record matching takes a key role, which gets the data that explore the similarity in one entity itself. Almost previous methods consist of a main drawback that user have to give the training data for those methods initially, which needs the user to provide training data. Such methods are not used for the databases of web, whenever the matching records are found out in the case of query stream then at that time results dynamically created on the fly. Those records are dependents on query and also a relearned method utilizing examples of training from the query results which are previous ones may not get success of a new query results. To get the crisis of matching record in the database of Web, we explore which is not supervised, record matching method in online, UDD, response of provided query, perfectly capable of find out duplicates from the records of query result various Web databases. After reducing of the similarsource duplicates, the “presumed” which are not duplicate records from the source same which can be utilized as examples of training decreases the work tense of users containing manually examples of label training. Begins from the non duplicate set, we utilize couple of cooperating classifiers, a weighted component similarity gathering classifier and an SVM classifier, to iteratively find out duplicates in the query results from various Web databases. Researches explore that UDD capability is good in the view of Web database where the previous methods are not efficient.
关键词:Coherency; Continuous queries; Cost; Distributed query processing; Data dissemination; Performance