首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Disentangling Document Topic and Author Gender in Multiple Languages: Lessons for Adversarial Debiasing
  • 本地全文:下载
  • 作者:Erenay Dayanik ; Sebastian Padó
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:50-61
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Text classification is a central tool in NLP. However, when the target classes are strongly correlated with other textual attributes, text classification models can pick up “wrong” features, leading to bad generalization and biases. In social media analysis, this problem surfaces for demographic user classes such as language, topic, or gender, which influence the generate text to a substantial extent. Adversarial training has been claimed to mitigate this problem, but thorough evaluation is missing. In this paper, we experiment with text classification of the correlated attributes of document topic and author gender, using a novel multilingual parallel corpus of TED talk transcripts. Our findings are: (a) individual classifiers for topic and author gender are indeed biased; (b) debiasing with adversarial training works for topic, but breaks down for author gender; (c) gender debiasing results differ across languages. We interpret the result in terms of feature space overlap, highlighting the role of linguistic surface realization of the target classes.
国家哲学社会科学文献中心版权所有