首页    期刊浏览 2024年11月10日 星期日
登录注册

文章基本信息

  • 标题:Record Matching Over Query Results Using Fuzzy Ontological Document Clustering
  • 本地全文:下载
  • 作者:V.Vijayaraja ; R.Prasanna Kumar ; M.A.Mukunthan
  • 期刊名称:International Journal on Computer Science and Engineering
  • 印刷版ISSN:2229-5631
  • 电子版ISSN:0975-3397
  • 出版年度:2011
  • 卷号:3
  • 期号:02
  • 页码:926-932
  • 出版社:Engg Journals Publications
  • 摘要:Record matching is an essential step in duplicate detection as it identifies records representing same real-world entity. Supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. To overcome the problem, a new record matching method named Unsupervised Duplicate Elimination (UDE) is proposed for identifying and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier, support vector machine classifier are iteratively employed with UDE where the first classifier utilizes the weights set to match records from different data sources. With the matched records as positive dataset and non duplicate records as negative set, the second classifier identifies new duplicates. Then, a new methodology to automatically interpret and cluster knowledge documents using an ontology schema is presented. Moreover, a fuzzy logic control approach is used to match suitable document cluster(s) for given patents based on their derived ontological semantic webs. Thus, this paper takes advantage of similarity among records from web databases and solves the online duplicate detection problem.
  • 关键词:Ontology schema;record matching; query results; SVM; UDE; duplicate detection
国家哲学社会科学文献中心版权所有