首页    期刊浏览 2024年09月16日 星期一
登录注册

文章基本信息

  • 标题:A Dynamic Two-Layers MI and Clustering-based Ensemble Feature Selection for Multi-Labels Text Classification
  • 本地全文:下载
  • 作者:Adil Yaseen Taha ; Sabrina Tiun ; Abdul Hadi Abd Rahman
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2020
  • 卷号:11
  • 期号:7
  • DOI:10.14569/IJACSA.2020.0110764
  • 出版社:Science and Information Society (SAI)
  • 摘要:Multi-label text classification deals with the issue that arises from each sample being related to multiple labels. The text data suffers from high dimensionality. In order to resolve this issue, a feature selection (FS) method can be implemented for efficiently removing the noisy, irrelevant, and redundant features. Multi-label FS is a powerful tool for solving the high-dimension problem. With regards to handling correlation and high dimensionality problems in multi-label text classification, this paper investigates the various heterogeneous FS ensemble schemes. In addition, this paper proposes an enhanced FS method called dynamic multi-label two-layers MI and clustering-based ensemble feature selection algorithm (DMMC-EFS). The proposed method considers the: 1) dynamic global weight of feature, 2) heterogeneous ensemble, and 3) maximum dependency and relevancy and minimum redundancy of features. This method aims to overcome the high dimensionality of multi-label datasets and acquire improved multi-label text classification. We have conducted experiments based on three benchmark datasets: Reuters-21578, Bibtex, and Enron. The experimental results show that DMMC-EFS has significantly outperformed other state-of-the-art conventional and ensemble multi-label FS methods.
  • 关键词:Multi-label text classification; high dimensionality; filtering method; ensemble clustering; ensemble MI feature selection
国家哲学社会科学文献中心版权所有