首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Automatic Detection of News Articles of Interest to Regional Communities
  • 本地全文:下载
  • 作者:Robin M. E. Swezey ; Hiroyuki Sano ; Shun Shiramatsu
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2012
  • 卷号:12
  • 期号:6
  • 页码:99-106
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:In this paper, we devise an approach for identifying and classifying contents of interest related to geographic communities from news articles streams. We first conduct a short study on related works, and then present our approach, which consists in 1) filtering out contents irrelevant to communities and 2) classifying the remaining relevant news articles. Using a confidence threshold, the filtering and classification tasks can be performed in one pass using the weights learned by the same algorithm. We use Bayesian text classification, and because of important empiric class imbalance in Web-crawled corpora, we test several approaches: Na?ve Bayes, Complementary Na?ve Bayes, use of {1,2,3}-Grams, and use of oversampling. We find out in our testing experiment on Japanese prefectures that 3-gram CNB with oversampling is the most effective approach in terms of precision, while retaining acceptable training time and testing time.
  • 关键词:Web Intelligence; Natural Language Processing; Machine Learning; Semantic Web
国家哲学社会科学文献中心版权所有