首页    期刊浏览 2025年02月26日 星期三
登录注册

文章基本信息

  • 标题:Effect of Pruning on Feature Ranking Metrics in Highly Skewed Datasets in Text Classi?cation
  • 本地全文:下载
  • 作者:Muhammad Nabeel Asim ; Abdur Rehman ; Muhammad Idrees
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2017
  • 卷号:17
  • 期号:10
  • 页码:135-144
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:A variety of feature ranking algorithms are available for text data to select appropriate features for a classification task. To improve the feature selection process, data is preprocessed to remove too frequent and too rare terms, called pruning. Although not required for non-text data, pruning has become and essential step to simplify the feature selection of text data, which results in boosting the overall classification performance. In this paper we have studied the effect of pruning on eight well known feature selection metrics, namely NDM, IG, ODDS, CHI, DFS, POIS, GINI and ACC2. while evaluation of FR metrics is done using featured micro and macro F1 measure by using SVM classifier. Experimental results on five bench mark datasets, including WAP, RE0, RE1, K1a and K1b, show that pruning adversely affect three feature ranking algorithms IG, DFS and ACC2, for which pruning reduces the overall efficiency of the classification. While pruning improves the classification performance for the rest five FR metrics.
  • 关键词:Text Classi?cation; ranking algorithms
国家哲学社会科学文献中心版权所有