摘要:In this study, the data based on nucleic acid amplication techniques(Polymerase chain reaction) consisting of 23 dierent transcript variableswhich are involved to investigate genetic mechanism regulating chlamydialinfection disease by measuring two dierent outcomes of muring C.pneumonia lung infection (disease expressed as lung weight increase andC. pneumonia load in the lung), have been analyzed. A model with fewerreduced transcript variables of interests at early infection stage has beenobtained by using some of the traditional (stepwise regression, partial leastsquares regression (PLS)) and modern variable selection methods (least absoluteshrinkage and selection operator (LASSO), forward stagewise regressionand least angle regression (LARS)). Through these variable selectionmethods, the variables of interest are selected to investigate the geneticmechanisms that determine the outcomes of chlamydial lung infection. Thetranscript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) arebeing detected as the main variables of interest to study the C. pneumoniadisease (lung weight increase) or C. pneumonia lung load outcomes. Modelsincluding these key variables may provide possible answers to the problemof molecular mechanisms of chlamydial pathogenesis.
关键词:LASSO; multicollinearity; partial least squares regression; stepwise;regression; variable selection.