首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Feature screening for ultrahigh dimensional binary data
  • 本地全文:下载
  • 作者:Guan, Guoyu ; Shan, Na ; Guo, Jianhua
  • 期刊名称:Statistics and Its Interface
  • 印刷版ISSN:1938-7989
  • 电子版ISSN:1938-7997
  • 出版年度:2017
  • 卷号:11
  • 期号:1
  • 页码:41-50
  • DOI:10.4310/SII.2018.v11.n1.a4
  • 语种:English
  • 出版社:International Press
  • 摘要:With the rapid development of information technology, ultrahigh dimensional binary data have increased dramatically, for which feature screening has become a necessary step in real data analysis. In this article, we propose a $L_0$-regularization feature screening procedure for naive Bayes classifier, which is equivalent to the classical mutual information screening method. However, the turning parameter in $L_0$-regularization is hard to be selected and lack of theoretical support. To this end, a BIC-type criterion is applied to identify important features. Moreover, the asymptotic properties of the proposed method is theoretically investigated under some mild assumptions. Lastly, its outstanding performance is numerically confirmed on simulated data, and a real example of Chinese document classification is presented for illustration purpose.
  • 关键词:feature screening; $L_0$-regularization; naive Bayes; screening consistency
国家哲学社会科学文献中心版权所有