摘要:Optimal risk and preventive patterns are itemsets which can identify characteristics of cohorts of individuals who have significantly disproportionate representation in the abnormal and normal groups. In this paper, we propose a new classifier namely ORPSW (Optimal Risk and Preventive Sets with Weights) to classify gene expression data based on optimal risk and preventive patterns. The proposed method has been tested on four bench-mark gene expression data sets to compare with three state-of-the-art classifiers: C4.5, Naive Bayes and SVM. The experiments show that ORPSW classifier is more accurate than C4.5 and Naive Bayes classifiers in general, and is comparable with SVM classifier. Observing that accuracy is sensitive to the prior distribution of the class, we also used false positive rate (FPR) and false negative rate (FNR), to better characterize the performance of classifiers. ORPSW classifier is also very good under this measure. It provides differentially expressed genes in different classes, which help better understand classification process.
关键词:Optimal risk and preventive patterns;Weight;Gene expression data;Classifier