文章基本信息

标题：Mining of Important Informative Genes and Classifier Construction for Cancer Dataset
本地全文：下载
作者：Soumen Kumar Pati ; Asit Kumar Das
期刊名称：International Journal on Soft Computing
电子版ISSN：2229-7103
出版年度：2012
卷号：3
期号：3
DOI：10.5121/ijsc.2012.3306
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：Microarray is a useful technique for measuring expression data of thousands or more of genes simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust gene identification methods is extremely fundamental. Many gene selection methods as well as their corresponding classifiers have been proposed. In the proposed method, a single gene with high class- discrimination capability is selected and classification rules are generated for cancer based on gene expression profiles. The method first computes importance factor of each gene of experimental cancer dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high class discrimination capability according to their depended degree of classes. Then initial important genes are selected according to high importance factor of each gene and form initial reduct. Then traditional k- means clustering algorithm is applied on each selected gene of initial reduct and compute miss- classification errors of individual genes. The final reduct is formed by selecting most important genes with respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples of experimental test dataset. The proposed method test on four publicly available cancerous gene expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to prove the robustness of proposed method compares the outcomes (correctly classified instances) with some existing well known classifiers.
关键词：Microarray cancer data; K-means algorithm; Gene selection; Classification Rule; Cancer sample;identification; Gene reducts.