期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2014
卷号:9
期号:6
页码:347-360
DOI:10.14257/ijmue.2014.9.6.33
出版社:SERSC
摘要:An extremely crucial step in the diagnosis of cancers is to select a small number of informative genes for accurate classification. This issue has become a hot focus in the data mining of gene expression profiles. Especially for data with a large number of cancer types, many conventional classification methods show very poor performance. Here, we proposed a new approach for gene selection and multi-cancer classification based on step-by-step improvement of classification performance (SSiCP). The SSiCP gene selection algorithms were evaluated over the NCI60 and GCM benchmark datasets, with accuracy of 96.6% and 95.5% in 10-fold cross-validation, respectively. Furthermore, the SSiCP outperformed recently published algorithms when applied to another two multi-cancer data sets. Computational evidence indicated that SSiCP can avoid overfitting effectively. Compared with various gene selection algorithms, the implementation of SSiCP is simple and many of the selected genes by SSiCP are shown to be closely related to cancers.
关键词:Multiclass cancer classification; gene expression profile; machine learning; ; data mining; gene selection