首页    期刊浏览 2025年02月17日 星期一
登录注册

文章基本信息

  • 标题:High dimensional, robust, unsupervised record linkage
  • 本地全文:下载
  • 作者:Sabyasachi Bera ; Snigdhansu Chatterjee
  • 期刊名称:Statistics in Transition
  • 印刷版ISSN:1234-7655
  • 电子版ISSN:2450-0291
  • 出版年度:2020
  • 卷号:21
  • 期号:4
  • 页码:123-143
  • DOI:10.21307/stattrans-2020-034
  • 出版社:Exeley Inc.
  • 摘要:We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.
  • 关键词:record linkage; principal components; high dimensional; robust
国家哲学社会科学文献中心版权所有