摘要:This paper presents efficient techniques for constructing Artificial Neural Networks (ANN) models for tissue samples classification using high-dimensional microarray cancer data. A good ANN classification model is governed mainly by the quality of gene predictors, the chosen activation function and the number of interactive (hidden) neurons employed to build the model. Up till now, no standard procedure to uniquely determine the suitable number of hidden layers that would yield good ANN models in any given microarray cancer classification problem has been reported in the literature. To fill this gab, a data-driven algorithm that efficiently determines the optimal number of hidden layers that are desirable to yield efficient neural networks models in any binary response microarray cancer tumour classification problem is proposed in this work. The sub-sampling scheme of Monte Carlo cross-validation (MCCV) was adopted to construct the best neural networks models within a range of specified number of hidden layers. Applications on simulated and real life data sets showed that the proposed method yielded stable and efficient neural networks classification models with good prediction results. The leukemia and diffuse large B-cell lymphoma (DLBCL) microarray cancer data sets both of which are publicly available are employed to demonstrate our results
关键词:Artificial Neural Networks; average misclassification error rate; Monte Carlo cross-validation; Receiver Operating Characteristic Curve; AUC; Sensitivity; Specificity