首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:Learning Document Similarity Using Natural Language Processing
  • 其他标题:Learning Document Similarity Using Natural Language Processing
  • 本地全文:下载
  • 作者:Paola Merlo ; James Henderson ; Gerold Schneider
  • 期刊名称:Linguistik Online
  • 印刷版ISSN:1615-3014
  • 出版年度:2003
  • 卷号:17
  • 期号:5
  • 页码:99-115
  • DOI:10.13092/lo.17.788
  • 摘要:The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.
国家哲学社会科学文献中心版权所有