首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Experimenting N-Grams in Text Categorization
  • 本地全文:下载
  • 作者:Abdellatif Rahmoun ; Zakaria Elberrichi
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2007
  • 卷号:4
  • 期号:4
  • 出版社:Zarqa Private University
  • 摘要:This paper deals with automatic supervised classification of documents. The approach suggested is based on a vector representation of the documents centred not on the words but on the n-grams of characters for varying n. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine and Kullback&Liebler distances, and two benchmark corpuses the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The evaluation was done, by using the macroaveraged F1 function. The results show the effectiveness of this approach compared to the Bag-Of-Word and stem representations
  • 关键词:Text categorization; n-grams; multivariate chi-square; cosine measure; reuters21578; 20 news groups.
国家哲学社会科学文献中心版权所有