首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India
  • 本地全文:下载
  • 作者:Tharindu Ranasinghe ; Marcos Zampieri
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:8
  • 页码:306
  • DOI:10.3390/info12080306
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been published in the last few years, with promising results. The majority of these studies, however, deal with high-resource languages such as English due to the availability of datasets in these languages. Recent work has addressed offensive language identification from a low-resource perspective, exploring data augmentation strategies and trying to take advantage of existing multilingual pretrained models to cope with data scarcity in low-resource scenarios. In this work, we revisit the problem of low-resource offensive language identification by evaluating the performance of multilingual transformers in offensive language identification for languages spoken in India. We investigate languages from different families such as Indo-Aryan (e.g., Bengali, Hindi, and Urdu) and Dravidian (e.g., Tamil, Malayalam, and Kannada), creating important new technology for these languages. The results show that multilingual offensive language identification models perform better than monolingual models and that cross-lingual transformers show strong zero-shot and few-shot performance across languages.
国家哲学社会科学文献中心版权所有