首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification
  • 本地全文:下载
  • 作者:Arezou Mojiri ; Abbas Khalili ; Ali Zeinal Hamadani
  • 期刊名称:Electronic Journal of Statistics
  • 印刷版ISSN:1935-7524
  • 出版年度:2022
  • 卷号:16
  • 期号:1
  • 页码:814-861
  • DOI:10.1214/21-EJS1939
  • 语种:English
  • 出版社:Institute of Mathematical Statistics
  • 摘要:In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this paper, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of hard-thresholding rules based on a data splitting technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.
  • 关键词:62H30;‎classification‎;high-dimensionality;imbalanced;linear discriminant analysis;thresholding
国家哲学社会科学文献中心版权所有