首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM
  • 本地全文:下载
  • 作者:Abdelmalek Amine ; Zakaria Elberrichi ; Michel Simonet
  • 期刊名称:INFOCOMP
  • 印刷版ISSN:1807-4545
  • 出版年度:2008
  • 卷号:7
  • 期号:01
  • 页码:27-35
  • 出版社:Federal University of Lavras
  • 摘要:With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom…), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations: a conceptual representation of texts and a representation based on n-grams, instead of a representation based on words. The effects of these combinations are examined in several experiments using 4 measurements of similarity. The Reuters-21578 corpus is used for evaluation. The evaluation was done by using the F-measure and the entropy.
  • 关键词:Text clustering, Self-Organizing Maps of Kohonen, n-grams, concept, similarity, Reuters21578.
国家哲学社会科学文献中心版权所有