文章基本信息

标题：Feature screening for ultrahigh dimensional binary data
本地全文：下载
作者：Guan, Guoyu ; Shan, Na ; Guo, Jianhua 等
期刊名称：Statistics and Its Interface
印刷版ISSN：1938-7989
电子版ISSN：1938-7997
出版年度：2017
卷号：11
期号：1
页码：41-50
DOI：10.4310/SII.2018.v11.n1.a4
语种：English
出版社：International Press
摘要：With the rapid development of information technology, ultrahigh dimensional binary data have increased dramatically, for which feature screening has become a necessary step in real data analysis. In this article, we propose a $L_0$-regularization feature screening procedure for naive Bayes classifier, which is equivalent to the classical mutual information screening method. However, the turning parameter in $L_0$-regularization is hard to be selected and lack of theoretical support. To this end, a BIC-type criterion is applied to identify important features. Moreover, the asymptotic properties of the proposed method is theoretically investigated under some mild assumptions. Lastly, its outstanding performance is numerically confirmed on simulated data, and a real example of Chinese document classification is presented for illustration purpose.
关键词：feature screening; $L_0$-regularization; naive Bayes; screening consistency