首页    期刊浏览 2024年11月25日 星期一
登录注册

文章基本信息

  • 标题:Hidden Markov Model as a Tool for Analysis of Temporal Dynamic Record Deduplication
  • 本地全文:下载
  • 作者:R. Parimala Devi ; X. Agnes Kalarani
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2017
  • 卷号:8
  • 期号:3
  • 页码:352-356
  • 出版社:TechScience Publications
  • 摘要:Record deduplication is one of the challengingresearch areas in data mining. In most of the organizations,the storage systems have duplicate copies of several pieces ofdata. The dedicated data compression method is datadeduplication which is used to remove the duplicate copies ofrepeating data. In previous research, genetic programmingbased record deduplication was used in which combinedvarious pieces of evidence extracted from the data content.However, the true positive level of the system is low.Therefore, the performance of the record deduplicationsystem is degraded. To solve this problem, the HiddenMarkov Model based record deduplication method isproposed. In a HMM, the records with different attributesare called states and similarity functions among the couple ofrecords are called transition. The data records attributeinformation are cleaned, standardised and implementedthrough a Hidden Markov Models (HMMs). Evaluating theperformance of the system using Restaurants data set andCora Bibliographic data set. The result obtained is the HMMbased results, the duplicate and non-duplicate records of data.The system improves true positive level of the system.
  • 关键词:Record Deduplication; Hidden Markov Model;Genetic Programming
国家哲学社会科学文献中心版权所有