首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification
  • 本地全文:下载
  • 作者:Víctor Carrera-Trejo ; Grigori Sidorov ; Sabino Miranda-Jiménez
  • 期刊名称:International Journal of Combinatorial Optimization Problems and Informatics
  • 印刷版ISSN:2007-1558
  • 电子版ISSN:2007-1558
  • 出版年度:2015
  • 卷号:6
  • 期号:1
  • 页码:7-19
  • 语种:English
  • 出版社:International Journal of Combinatorial Optimization Problems and Informatics
  • 其他摘要:In text classification task one of the main problems is to choose which features give the best results. Various features can be used like words, n-grams, syntactic n-grams of various types (POS tags, dependency relations, mixed, etc.), or a combinations of these features can be considered. Also, algorithms for dimensionality reduction of these sets of features can be applied, like Latent Dirichlet Allocation (LDA). In this paper, we consider multi-label text classification task and apply various feature sets. We consider a subset of multi-labeled files from the Reuters-21578 corpus. We use traditional tf-IDF values of the features and tried both considering and ignoring stop words. We also tried several combinations of features, like bigrams and unigrams. We also experimented with adding LDA results into Vector Space Models as new features. These last experiments obtained the best results.
  • 其他关键词:Multi-label text classification; Reuters-21578; Latent Dirichlet Allocation; tf-idf; Vector Space Model.
国家哲学社会科学文献中心版权所有