首页    期刊浏览 2024年09月21日 星期六
登录注册

文章基本信息

  • 标题:Building a Malay-English Code-Switching Subjectivity Corpus for Sentiment Analysis
  • 本地全文:下载
  • 作者:Emaliana Kasmuri ; Halizah Basiron
  • 期刊名称:International Journal of Advances in Soft Computing and Its Applications
  • 印刷版ISSN:2074-8523
  • 出版年度:2019
  • 卷号:11
  • 期号:1
  • 页码:112-130
  • 出版社:International Center for Scientific Research and Studies
  • 摘要:Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus.
  • 关键词:Annotation guideline;code-switching corpus;sentiment analysis;subjectivity corpus
国家哲学社会科学文献中心版权所有