首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Attribute Overlap Minimization and Outlier Elimination as Dimensionality Reduction Techniques for Text Classification Algorithms
  • 本地全文:下载
  • 作者:Fong, Simon ; Cerone, Antonio
  • 期刊名称:Journal of Emerging Technologies in Web Intelligence
  • 印刷版ISSN:1798-0461
  • 出版年度:2012
  • 卷号:4
  • 期号:3
  • 页码:259-263
  • DOI:10.4304/jetwi.4.3.259-263
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Text classification is the task of assigning free text documents to some predefined groups. Many algorithms have been proposed; in particular, dimensionality reduction (DR) which is an important data pre-processing step has been extensively studied. DR can effectively reduce the features representation space which in turn helps improve the efficiency of text classification. Two DR methods namely Attribute Overlap Minimization (AOM) and Outlier Elimination (OE) are applied for downsizing the features representation space, on the numbers of attributes and amount of instances respectively, prior to training a decision model for text classification. AOM works by swapping the membership of the overlapped attributes (which are also known as features or keywords) to a group that has a higher occurrence frequency. Dimensionality is lowered when only significant and unique attributes are describing unique groups. OE eliminates instances that describe infrequent attributes. These two DR techniques can function with conventional feature selection together to further enhance their effectiveness. In this paper, two datasets on classifying languages and categorizing online news into six emotion groups are tested with a combination of AOM, OE and a wide range of classification algorithms. Significant improvements in prediction accuracy, tree size and speed are observed.
  • 关键词:Data stream mining;optimized very fast decision tree;incremental optimization
国家哲学社会科学文献中心版权所有