首页    期刊浏览 2025年09月23日 星期二
登录注册

文章基本信息

  • 标题:GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS
  • 本地全文:下载
  • 作者:Machová, Kristína ; Szabóová, Andrea ; Bednár, Peter
  • 期刊名称:Journal of Information and Organizational Sciences
  • 印刷版ISSN:1846-3312
  • 电子版ISSN:1846-9418
  • 出版年度:2007
  • 卷号:31
  • 期号:1
  • 页码:101-113
  • 出版社:Faculty of Organization and Informatics University of Zagreb
  • 摘要:The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-IDF method) for key words generation from a text document collection. These key words should characterise the content of text documents and can be used to retrieve relevant documents from a document collection. Term relations were detected on the base of conditional probability of term occurrences. The focus is on the detection of those words, which occur together very often. Thus, key words, which consist from two terms were generated additionally. Several tests were carried out using the 20 News Groups collection of text documents.
  • 关键词:text documents; key terms generation; TF-IDF method; information gain; mutual information; term relation
国家哲学社会科学文献中心版权所有