文章基本信息

标题：An Overview of Fairness in Data – Illuminating the Bias in Data Pipeline
本地全文：下载
作者：Senthil Kumar B ; Aravindan Chandrabose ; Bharathi Raja Chakravarthi 等
期刊名称：Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度：2021
卷号：2021
页码：34-45
语种：English
出版社：ACL Anthology
摘要：Data in general encodes human biases by default; being aware of this is a good start, and the research around how to handle it is ongoing. The term ‘bias’ is extensively used in various contexts in NLP systems. In our research the focus is specific to biases such as gender, racism, religion, demographic and other intersectional views on biases that prevail in text processing systems responsible for systematically discriminating specific population, which is not ethical in NLP. These biases exacerbate the lack of equality, diversity and inclusion of specific population while utilizing the NLP applications. The tools and technology at the intermediate level utilize biased data, and transfer or amplify this bias to the downstream applications. However, it is not enough to be colourblind, gender-neutral alone when designing a unbiased technology – instead, we should take a conscious effort by designing a unified framework to measure and benchmark the bias. In this paper, we recommend six measures and one augment measure based on the observations of the bias in data, annotations, text representations and debiasing techniques.