摘要:We present a joint modeling approach for multiple imputation of missing continuous
and categorical variables using Bayesian mixture models. The approach extends
the idea of focused clustering, in which one separates variables into two sets before
estimating the mixture model. Focus variables include variables with high rates of
missingness and possibly other variables that could help improve the quality of the
imputations. Non-focus variables include the remainder. In this way, one can use a
rich sub-model for the focus set and a simpler model for the non-focus set, thereby
concentrating fitting power on the variables with the highest rates of missingness.
We present a procedure for specifying which variables with low rates of missingness
to include in the focus set. We examine the performance of the imputation procedure
using simulation studies based on artificial data and on data from the American
Community Survey.