首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Evaluation measure for group-based record linkage
  • 本地全文:下载
  • 作者:Charini Nanayakkara ; Peter Christen ; Thilina Ranbaduge
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2019
  • 卷号:4
  • 期号:1
  • 页码:1-12
  • DOI:10.23889/ijpds.v4i1.1127
  • 出版社:Swansea University
  • 其他摘要:Traditionally, record linkage is concerned with linking pairs of records across data sets and the classification of such pairs into matches (assumed to refer to the same individual) and non-matches (assumed to refer to different individuals). Increasingly, however, more complex data sets are being linked where often the aim is to identify groups, or clusters, of records that refer to the same individual or to a group of related individuals. Examples include finding the records of all births to the same parents or all medical records generated by members of the same family. When ground truth data in the form of known true matches and non-matches are available, then linkage quality is traditionally evaluated based on the classified versus the true matches (links) using measures such as precision (also known as the positive predictive value) and recall (also known as sensitivity or the true positive rate). The quality of clusters generated in record linkage is of high importance, since the comparison of different linkage methods is largely based on the values obtained by such evaluation measures. However, minimal research has been conducted thus far to evaluate the suitability of existing evaluation measures in the context of linking groups of records. As we show, evaluation measures such as precision and recall are not suitable for evaluating groups of linked records because they evaluate the quality of individually linked record pairs rather than the quality of records grouped into clusters. We highlight the shortcomings of traditional evaluation measures and then propose a novel approach to evaluate cluster quality in the context of group-based record linkage. We empirically evaluate our proposed approach using real-world data and show that it better reflects the quality of clusters generated by a group-based record linkage technique.
  • 其他关键词:clustering;birth-bundling;historical record linkage
国家哲学社会科学文献中心版权所有