首页    期刊浏览 2025年06月20日 星期五
登录注册

文章基本信息

  • 标题:Measuring precision for deterministic and probabilistic record linkage
  • 本地全文:下载
  • 作者:Bindi Kindermann ; James Chipperfield ; Noel Hansen
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2017
  • 卷号:1
  • 期号:1
  • 页码:1-1
  • DOI:10.23889/ijpds.v1i1.110
  • 出版社:Swansea University
  • 摘要:ABSTRACT ObjectivesVarious organisations are increasingly linking administrative, survey, and census data to enhance dimensions such as time and breadth or depth of detail. Because a unique person identifier is often not available, records belonging to two different people may be incorrectly linked. Estimating the proportion of links that are correct, called precision, is difficult because, even after clerical review, there will remain some uncertainty about whether a link is in fact correct or incorrect. This presentation proposes some methods for estimating precision when using either deterministic (rules-based) or probabilistic linkage. These methods are model-based and do not require clerical review. The main uses of these methods are to estimate: 1. Precision during the linking process. This is useful to refine how linkage is carried out, such as the choice of linking variables and weight thresholds. 2. Precision after the files are linked. This provides a useful "quality indicator" of the linked data. ApproachTwo methods of estimating precision are described: 1. Simulation – the linking process is simulated many times, whether it is probabilistic or deterministic. The key step being the simulation of the agreement pattern between data sets, based on underlying probabilities. 2. An algebraic estimator – this is applicable for deterministic linking only, and provides a quicker way of estimating precision. Both methods are investigated using two studies: (i) synthetic data (ii) real data (death registrations linked to census data). ResultsThe estimators perform very well using both the synthetic and real data, even when assumptions about the independence of linking variables are violated. This suggests that the estimators are robust against moderate violations of these assumptions. ConclusionThe proposed estimators of precision are a very useful addition to the record linkage tool kit, providing methodical, faster, and cheaper alternatives to many present strategies that rely on clerical review. Estimates of precision are useful in the planning, process, and analysis of record linkage activities.
国家哲学社会科学文献中心版权所有