首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Various Approaches to Web Information Processing
  • 本地全文:下载
  • 作者:Machová, Kristína ; Bednár, Peter ; Mach, Marián
  • 期刊名称:COMPUTING AND INFORMATICS
  • 印刷版ISSN:1335-9150
  • 出版年度:2007
  • 卷号:26
  • 期号:3
  • 页码:301-327
  • 语种:English
  • 出版社:COMPUTING AND INFORMATICS
  • 摘要:The paper focuses on the field of automatic extraction of information from texts and text document categorisation including pre-processing of text documents, which can be found on the Internet. In the frame of the presented work, we have devoted our attention to the following issues related to text categorisation: increasing the precision of categorisation algorithm results with the aid of a boosting method; searching a minimum number of decision trees, which enables the improvement of the categorisation; the influence of unlabeled data with predicted categories on categorisation precision; shortening click streams needed to access a given web document; and generation of key words related with a web document. The paper presents also results of experiments, which were carried out using the 20 News Groups and Reuters-21578 collections of documents and a collection of documents from an Internet portal of the Markiza broadcasting company.
  • 关键词:information extraction; document categorisation; boosting; predicted categories; click stream; kex word generation
国家哲学社会科学文献中心版权所有