首页    期刊浏览 2024年11月05日 星期二
登录注册

文章基本信息

  • 标题:Random Forests for Generating Partially Synthetic, Categorical Data
  • 本地全文:下载
  • 作者:Gregory Caiola ; Jerome P. Reiter
  • 期刊名称:Transactions on Data Privacy
  • 印刷版ISSN:1888-5063
  • 电子版ISSN:2013-1631
  • 出版年度:2010
  • 卷号:3
  • 期号:1
  • 页码:27-42
  • 出版社:IIIA-CSIC
  • 摘要:

    Several national statistical agencies are now releasing partially synthetic, public use microdata. These comprise the units in the original database with sensitive or identifying values replaced with values simulated from statistical models. Specifying synthesis models can be daunting in databases that includemany variables of diverse types. These variablesmay be related inways that can be difficult to capture with standard parametric tools. In this article, we describe how random forests can be adapted to generate partially synthetic data for categorical variables. Using an empirical study, we illustrate that the random forest synthesizer can preserve relationships reasonably well while providing low disclosure risks. The random forest synthesizer has some appealing features for statistical agencies: it can be applied with minimal tuning, easily incorporates numerical, categorical, and mixed variables as predictors, operates efficiently in high dimensions, and automatically fits non-linear relationships.

国家哲学社会科学文献中心版权所有