首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Unsupervised Detection of Duplicates in User Query Results using Blocking
  • 本地全文:下载
  • 作者:Dr. B. Vijaya Babu ; K. Jyotsna Santoshi
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2014
  • 卷号:5
  • 期号:3
  • 页码:3514-3520
  • 出版社:TechScience Publications
  • 摘要:Unsupervised learning involves exploring the unlabeled data to find some intrinsic or hidden structures. Duplicate detection enables to identify the records that represent the same real world entity. In the field of Data mining, there is an exponential growth in the amount data available. Thus, linking or matching records from various web databases is a major challenge as it involves complexity of comparing, each record in one database with all the records in other databases. Supervised learning methods fail in web database scenario as the records to be matched are query dependent. In the previous work, to handle this context, an online based untrained record linkage method, UDD was suggested. UDD proficiently identifies corresponding record pairs that represent same entity, from multiple web databases but is time consuming. This paper focuses on enhancing the performance of UDD by adding a blocking step. A computationally cheap clustering approach, Canopy Clustering is deployed in blocking step. Thus, prior to classifying records into duplicates and non-duplicates, clustering is performed and blocks of candidate record pairs are generated. Experimental results show that blocking optimizes the working of UDD in web database scenario.
  • 关键词:Unsupervised learning; Record linkage;Duplicate Detection; Record Matching; Blocking; Canopy;Clustering; Web database; Query result
国家哲学社会科学文献中心版权所有