首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Using WordNet for Text Categorization
  • 本地全文:下载
  • 作者:Zakaria Elberrichi ; Abdelattif Rahmoun ; Mohamed Amine Bentaalah
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2008
  • 卷号:5
  • 期号:1
  • 出版社:Zarqa Private University
  • 摘要:This paper explores a method that use WordNet concept to categorize text documents. The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different ways to form a new representative vector. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667
  • 关键词:20Newsgroups; ontology; reuters-21578; text categorization; wordNet; and cosine distance
国家哲学社会科学文献中心版权所有