首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:An Improved Classification Strategy for Filtering Relevant Tweets Using Bag-of-Word Classifiers
  • 本地全文:下载
  • 作者:Muhammad Asif Hossain Khan ; Masayuki Iwai ; Kaoru Sezaki
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2013
  • 卷号:8
  • 期号:3
  • 页码:823-832
  • DOI:10.11185/imt.8.823
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:In this paper we have presented a classification framework for classifying tweets relevant to some specific target sectors. Due to the imposed length restriction on an individual tweet, tweet classification faces some additional challenges which are not present in most other short text classification problems, needless to say in classification of standard written text. Hence, bag-of-word classifiers, which have been successfully leveraged for text classification in other domains, fail to achieve a similar level of accuracy in classifying tweets. In this paper, we have proposed a collocation feature selection algorithm for tweet classification. Moreover, we have proposed a strategy, built on our selected collocation features, for identifying and removing confounding outliers from a training set. An Evaluation on two real world datasets shows that the proposed model yields a better accuracy than the unigram model, uni-bigram model and also a partially supervised topic model on two different classification tasks.
  • 关键词:short text classification;microblog analysis;tweet filtering;bag-of-word classifiers;social networks
国家哲学社会科学文献中心版权所有