首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests
  • 本地全文:下载
  • 作者:Barbara Pes
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:8
  • 页码:286
  • DOI:10.3390/info12080286
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the <i>Random Forest</i>, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.
国家哲学社会科学文献中心版权所有