首页    期刊浏览 2024年10月08日 星期二
登录注册

文章基本信息

  • 标题:Introducing A largeTunisianArabizi Dialectal Dataset for Sentiment Analysis
  • 本地全文:下载
  • 作者:Chayma Fourati ; Hatem Haddad ; Abir Messaoudi
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:226-230
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.
国家哲学社会科学文献中心版权所有