文章基本信息

标题：The Impact of Synthetic Data Generation on Data Utility with Application to the 1991 UK Samples of Anonymised Records
本地全文：下载
作者：Jennifer Taub ; Mark Elliot ; Joseph W. Sakshaug 等
期刊名称：Transactions on Data Privacy
印刷版ISSN：1888-5063
电子版ISSN：2013-1631
出版年度：2020
卷号：13
期号：1
页码：1-23
出版社：IIIA-CSIC
摘要：Synthetic data generation has been proposed as a flexible alternative to more traditionalstatistical disclosure control (SDC) methods for minimising disclosure risk. However, a barrier to theuse of synthetic data is the uncertainty about the reliability and validity of the results that are derivedfrom these data. Surprisingly, there has been a relative dearth of research on how to measure theutility of synthetic data. Utility measures developed to date have been either information theoreticabstractions or somewhat arbitrary collations of statistics, and replication of previously publishedresults has been rare. In this paper, we adopt a methodology previously used by Purdam and Elliot(2007), in which they replicated published analyses using disclosure-controlled versions of thesame microdata used in said analyses and then evaluated the impact of disclosure control on the analyticoutcomes. We utilise the same studies as Purdam and Elliot, based on the 1991 UK Samplesof Anonymised Records, to facilitate comparisons of synthetic data utility between different utilitymetrics..
关键词：Synthetic data; CART; multiple imputation; utility metrics