首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Research Ready Data Lakes: Protecting Privacy in Relatable Datasets
  • 本地全文:下载
  • 作者:Robert McMillan ; Maggie Reeves
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2019
  • 卷号:4
  • 期号:3
  • 页码:1-1
  • DOI:10.23889/ijpds.v4i3.1266
  • 出版社:Swansea University
  • 其他摘要:Background with rationaleThe Georgia Policy Labs’ mission is to improve outcomes for children and families by producing rigorous research with long-term government partners. A key component of this model is having secure access to research-ready, individual level data from multiple sources to answer government agencies’ questions within policy windows. Obtaining sensitive data from our partners requires significant relationship building, demonstrations of value, and assurances of our ability to mitigate all security and privacy concerns. Objectives Securely transfer and de-identify disparate individual level datasets with personally identifiable information from public entities. Clean data and store in a pristine data lake, made available for fast turn-around research. Ensure individual records can be matched across disparate organizations’ datasets. ApproachOur practices, infrastructure, data sharing agreements and security are built to support the intersection of data availability for researchers and security standards that give our partners ease. We highlight two solutions addressing security concerns while supporting our researchers, which can be used by other researchers using sensitive data. First, we discuss our multiple tiers of transfer and access that remove risk from identifiable data. Second, we share the double hash solution created for a partner who was not willing to share PII. We share the source code for our SHA3-512 double hash solution, which allows for matching of records across disparate datasets without receiving PII sensitive elements. ResultsWe created reliable matching values without the need for the actual social security numbers or other PII values on our side, enabling a large school district to share its student-level data with us. ConclusionThe balance of security and easy access for researchers is a common area of friction. Our security set-up and hashing solution allows others to remove this barrier for applied policy research.
国家哲学社会科学文献中心版权所有