文章基本信息

标题：Weighted random subspace method for high dimensional data classification
本地全文：下载
作者：Xiaoye Li ; Hongyu Zhao
期刊名称：Statistics and Its Interface
印刷版ISSN：1938-7989
电子版ISSN：1938-7997
出版年度：2009
卷号：2
期号：2
页码：153-159
DOI：10.4310/SII.2009.v2.n2.a5
出版社：International Press
摘要：High dimensional data, especially those emerging from genomics and proteomics studies, pose significant challenges to traditional classification algorithms because the performance of these algorithms may substantially deteriorate due to high dimensionality and existence of many noisy features in these data. To address these problems, pre-classification feature selection and aggregating algorithms have been proposed. However, most feature selection procedures either fail to consider potential interactions among the features or tend to over fit the data. The aggregating algorithms, e.g. the bagging predictor, the boosting algorithm, the random subspace method, and the Random Forests algorithm, are promising in handling high dimensional data. However, there is a lack of attention to optimal weight assignments to individual classifiers and this has prevented these algorithms from achieving better classification accuracy. In this article, we formulate the weight assignment problem and propose a heuristic optimization solution. We have applied the proposed weight assignment procedures to the random subspace method to develop a weighted random subspace method. Several public gene expression and mass spectrometry data sets at the Kent Ridge biomedical data repository have been analyzed by this novel method. We have found that significant improvement over the common equal weight assignment scheme may be achieved by our method.
关键词：classification; aggregating algorithm; voting weight; random subspace projection