首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Dimensionality Reduction of Distributed Vector Word Representations and Emoticon Stemming for Sentiment Analysis
  • 本地全文:下载
  • 作者:Brian Dickinson ; Michael Ganger ; Wei Hu
  • 期刊名称:Journal of Data Analysis and Information Processing
  • 印刷版ISSN:2327-7211
  • 电子版ISSN:2327-7203
  • 出版年度:2015
  • 卷号:03
  • 期号:04
  • 页码:153-162
  • DOI:10.4236/jdaip.2015.34015
  • 语种:English
  • 出版社:Scientific Research Publishing
  • 摘要:Social media platforms such as Twitter and the Internet Movie Database (IMDb) contain a vast amount of data which have applications in predictive sentiment analysis for movie sales, stock market fluctuations, brand opinion, or current events. Using a dataset taken from IMDb by Stanford, we identify some of the most significant phrases for identifying sentiment in a wide variety of movie reviews. Data from Twitter are especially attractive due to Twitter’s real-time nature through its streaming API. Effectively analyzing this data in a streaming fashion requires efficient models, which may be improved by reducing the dimensionality of input vectors. One way this has been done in the past is by using emoticons; we propose a method for further reducing these features through identifying common structure in emoticons with similar sentiment. We also examine the gender distribution of emoticon usage, finding tendencies towards certain emoticons to be disproportionate between males and females. Despite the roughly equal gender distribution on Twitter, emoticon usage is predominately female. Furthermore, we find that distributed vector representations, such as those produced by Word2Vec, may be reduced through feature selection. This analysis was done on a manually labeled sample of 1000 tweets from a new dataset, the Large Emoticon Corpus, which consisted of about 8.5 million tweets containing emoticons and was collecting over a five day period in May 2015. Additionally, using the common structure of similar emoticons, we are able to characterize positive and negative emoticons using two regular expressions which account for over 90% of emoticon usage in the Large Emoticon Corpus.
  • 关键词:Natural Language;Emoticon;Twitter;Review
国家哲学社会科学文献中心版权所有