摘要:High-dimensional
longitudinal data arise frequently in biomedical and genomic research. It is
important to select relevant covariates when the dimension of the parameters
diverges as the sample size
increases. We consider the problem of variable selection in high-dimensional
linear models with longitudinal data. A new variable selection procedure is
proposed using the smooth-threshold generalized estimating equation and
quadratic inference functions (SGEE-QIF) to incorporate correlation
information. The proposed procedure automatically eliminates inactive
predictors by setting the corresponding parameters to be zero, and
simultaneously estimates the nonzero regression coefficients by solving the
SGEE-QIF. The proposed procedure avoids the convex optimization problem and is flexible and easy to implement. We establish
the asymptotic properties in a high-dimensional framework where the number of covariates increases as the number of
cluster increases. Extensive Monte
Carlo simulation studies are conducted to examine the finite sample performance
of the proposed variable selection
procedure.
关键词:Variable Selection; Diverging Number of Parameters; Longitudinal Data; Quadratic Inference Functions; Generalized Estimating Equation