首页    期刊浏览 2025年05月22日 星期四
登录注册

文章基本信息

  • 标题:Building and Annotating a Codeswitched Hate Speech Corpora
  • 本地全文:下载
  • 作者:Edward Ombui ; Lawrence Muchemi ; Peter Wagacha
  • 期刊名称:International Journal of Information Technology and Computer Science
  • 印刷版ISSN:2074-9007
  • 电子版ISSN:2074-9015
  • 出版年度:2021
  • 卷号:13
  • 期号:3
  • 页码:33-52
  • DOI:10.5815/ijitcs.2021.03.03
  • 出版社:MECS Publisher
  • 摘要:Presidential campaign periods are a major trigger event for hate speech on social media in almost every country. A systematic review of previous studies indicates inadequate publicly available annotated datasets and hardly any evidence of theoretical underpinning for the annotation schemes used for hate speech identification. This situation stifles the development of empirically useful data for research, especially in supervised machine learning. This paper describes the methodology that was used to develop a multidimensional hate speech framework based on the duplex theory of hate [1] components that include distance, passion, commitment to hate, and hate as a story. Subsequently, an annotation scheme based on the framework was used to annotate a random sample of ~51k tweets from ~400k tweets that were collected during the August and October 2017 presidential campaign period in Kenya. This resulted in a gold-standard codeswitched dataset that could be used for comparative and empirical studies in supervised machine learning. The resulting classifiers trained on this dataset could be used to provide real-time monitoring of hate speech spikes on social media and inform data-driven decision-making by relevant security agencies in government.
  • 关键词:Annotation scheme;Hate Speech;Dataset;distancing language;Code-switching
国家哲学社会科学文献中心版权所有