期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2017
卷号:17
期号:10
页码:135-144
出版社:International Journal of Computer Science and Network Security
摘要:A variety of feature ranking algorithms are available for text data to select appropriate features for a classification task. To improve the feature selection process, data is preprocessed to remove too frequent and too rare terms, called pruning. Although not required for non-text data, pruning has become and essential step to simplify the feature selection of text data, which results in boosting the overall classification performance. In this paper we have studied the effect of pruning on eight well known feature selection metrics, namely NDM, IG, ODDS, CHI, DFS, POIS, GINI and ACC2. while evaluation of FR metrics is done using featured micro and macro F1 measure by using SVM classifier. Experimental results on five bench mark datasets, including WAP, RE0, RE1, K1a and K1b, show that pruning adversely affect three feature ranking algorithms IG, DFS and ACC2, for which pruning reduces the overall efficiency of the classification. While pruning improves the classification performance for the rest five FR metrics.