摘要:Building predictive models for genomic mining requires feature selection, as an essential preliminary step to reduce the large number of available variable. Feature selection in the process of select a generally smaller subset of variables (features) that can be considered the best, from a statistical point of view, with respect to the employed model for the analysis. In gene expression microarray data, being able to select a few number of important genes not only makes data analysis efficient but also helps their biological interpretation. Microarray data have typically several thousands of genes (features) but only tens of samples.
Problems which can occur due to the small sample size have not been addressed well in the literature. Our aim is to discuss some issues on feature selection applied to microarray data in order to select the most important genes from a predictive point of view