首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:REC: fast sparse regression-based multicategory classification
  • 本地全文:下载
  • 作者:Chong Zhang ; Xiaoling Lu ; Zhengyuan Zhu
  • 期刊名称:Statistics and Its Interface
  • 印刷版ISSN:1938-7989
  • 电子版ISSN:1938-7997
  • 出版年度:2017
  • 卷号:10
  • 期号:2
  • 页码:175-185
  • DOI:10.4310/SII.2017.v10.n2.a2
  • 出版社:International Press
  • 摘要:Recent advance in technology enables researchers to gather and store enormous data sets with ultra high dimensionality. In bioinformatics, microarray and next generation sequencing technologies can produce data with tens of thousands of predictors of biomarkers. On the other hand, the corresponding sample sizes are often limited. For classification problems, to predict new observations with high accuracy, and to better understand the effect of predictors on classification, it is desirable, and often necessary, to train the classifier with variable selection. In the literature, sparse regularized classification techniques have been popular due to the ability of simultaneous classification and variable selection. Despite its success, such a sparse penalized method may have low computational speed, when the dimension of the problem is ultra high. To overcome this challenge, we propose a new sparse REgression based multicategory Classifier (REC). Our method uses a simplex to represent different categories of the classification problem. A major advantage of REC is that the optimization can be decoupled into smaller independent sparse penalized regression problems, and hence solved by using parallel computing. Consequently, REC enjoys an extraordinarily fast computational speed. Moreover, REC is able to provide class conditional probability estimation. Simulated examples and applications on microarray and next generation sequencing data suggest that REC is very competitive when compared to several existing methods.
  • 关键词:LASSO; parallel computing; probability estimation; simplex; variable selection
国家哲学社会科学文献中心版权所有