首页    期刊浏览 2025年03月15日 星期六
登录注册

文章基本信息

  • 标题:An instrumental variable approach to estimation of match probabilities or precision in linked data
  • 本地全文:下载
  • 作者:James Doidge
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2019
  • 卷号:4
  • 期号:3
  • 页码:1-1
  • DOI:10.23889/ijpds.v4i3.1258
  • 出版社:Swansea University
  • 其他摘要:Background with rationaleWhile probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimation of absolute probabilities. A few variations on Fellegi and Sunter’s framework have been proposed to better accommodate the dependencies that limit transformation of match weights into match probabilities, but there are almost no alternative frameworks for match probability estimation. Main AimTo explore the feasibility, accuracy and limitations of a novel instrumental variable approach to estimation of match probabilities for use in either probabilistic record linkage or evaluation of linkage error. Methods/ApproachUsing both simulated data and a gold standard (labelled) dataset derived from real-world linked data, I assessed the accuracy of match probability estimation for a range of potential instruments and compared results to estimates produced using conventional probabilistic techniques. ResultsThe technique involves trading the potential value of one matching variable in discriminating between candidate links for improved estimation of match probabilities within groups of otherwise similar candidates. Analysis of simulated data confirmed the theoretical validity of the approach in supporting unbiased estimation of match probabilities despite dependencies between other matching variables. Analysis of real-world data demonstrated feasibility in terms of the availability of real-world instruments that provided sufficiently accurate estimation in groups of candidate links above a minimum size. Invalid instruments produced estimates that could be strongly biased. ConclusionThese early results are promising but the general availability of valid instruments, their ‘affordability’ in terms of sacrificed discrimination, and means for identifying valid instruments remain unclear. However, this approach represents a new variety of tool for the data linker’s toolkit, which may provide a useful angle on an otherwise difficult-to-estimate parameter and have applications yet to be envisaged.
国家哲学社会科学文献中心版权所有