首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Automated Online News Content Extraction
  • 本地全文:下载
  • 作者:B. A. Ojokoh
  • 期刊名称:International Journal of Computer Science Research and Application
  • 印刷版ISSN:2012-9564
  • 电子版ISSN:2012-9572
  • 出版年度:2012
  • 卷号:2
  • 期号:3
  • 出版社:INREWI Publications
  • 摘要:With the growth of the Internet and related tools, there has been an exponential growth of online resources. This tremendous growth has paradoxically made the task of finding, extracting and aggregating relevant information difficult. These days, finding and browsing news is one of the most important internet activities. In this paper, a hybrid method for online news article contents extraction is presented. The method combines RSS feeds and HTML Document Object Model (DOM) tree extraction. This approach is simple and effective at solving the problems associated with heterogeneous news layout and changing content found in many existing methods. The experimental results on some selected news sites show that the approach can extract news article contents automatically, effectively and consistently. The proposed method can also be adopted for other news sites.
国家哲学社会科学文献中心版权所有