期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2017
卷号:114
期号:22
页码:5671-5676
DOI:10.1073/pnas.1619944114
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching—the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people—one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications—we find that 90–98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99–100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers—including databases of forensic significance.
关键词:forensic DNA ; genomic privacy ; imputation ; population genetics ; record matching