Who Gets Protected? A Fairness Analysis of Cross-Lingual Social Bias Detection for Hindi

Sonia Shahzadi1, Sanjiv Kumar1
1Indian Institute of Technology (IIT) Delhi
DOI: https://doi.org/10.71448/bcds2452-1
Published: 30/06/2024
Cite this article as: Sonia Shahzadi, Sanjiv Kumar. Who Gets Protected? A Fairness Analysis of Cross-Lingual Social Bias Detection for Hindi. Bulletin of Computer and Data Sciences, Volume 5 Issue 2. Page: 1-13.

Abstract

Automatic social bias detection is increasingly deployed to moderate harmful content on social media, often in settings where training data for low-resource languages is scarce. Recent work shows that multilingual transformers fine-tuned on high-resource languages can be adapted to detect biased content in Hindi with strong overall F1 scores. However, little is known about how such cross-lingual bias detectors behave across different social groups: do they protect all communities equally, or do some groups experience systematically higher false positives or false negatives? In this paper, we present a group-level fairness analysis of cross-lingual social bias detection for Hindi. Building on a Hindi social bias dataset annotated with bias labels, categories (e.g., religion, politics, caste, occupation), targets, and sentiment, we derive a set of group indicators for religious communities, political actors, and caste-related mentions. We then compare several training regimes for XLM-R: (i) Hindi-only training, (ii) sequential English\(\rightarrow\)Hindi fine-tuning, (iii) joint English+Hindi training, and (iv) a translate-to-English pipeline. For each setup, we report both global metrics and group-wise error rates (true positive rate, false positive rate, false negative rate) and summarize disparities via worst-group F1 and average absolute gap. Our analysis reveals three key findings. First, cross-lingual transfer that improves overall F1 may increase error disparities for specific communities, especially minority or politically sensitive groups. Second, translate-to-English pipelines systematically over-flag some religious and political groups compared to native-script models. Third, a simple group-aware reweighting scheme can substantially reduce worst-group error without sacrificing average performance. We conclude with recommendations for evaluating and mitigating unfairness when deploying cross-lingual bias detectors in Hindi and other low-resource languages.

Keywords: cross-lingual bias detection, multilingual transformers, group fairness analysis, Hindi social media moderation, bias mitigation strategies

Abstract

Automatic social bias detection is increasingly deployed to moderate harmful content on social media, often in settings where training data for low-resource languages is scarce. Recent work shows that multilingual transformers fine-tuned on high-resource languages can be adapted to detect biased content in Hindi with strong overall F1 scores. However, little is known about how such cross-lingual bias detectors behave across different social groups: do they protect all communities equally, or do some groups experience systematically higher false positives or false negatives? In this paper, we present a group-level fairness analysis of cross-lingual social bias detection for Hindi. Building on a Hindi social bias dataset annotated with bias labels, categories (e.g., religion, politics, caste, occupation), targets, and sentiment, we derive a set of group indicators for religious communities, political actors, and caste-related mentions. We then compare several training regimes for XLM-R: (i) Hindi-only training, (ii) sequential English\(\rightarrow\)Hindi fine-tuning, (iii) joint English+Hindi training, and (iv) a translate-to-English pipeline. For each setup, we report both global metrics and group-wise error rates (true positive rate, false positive rate, false negative rate) and summarize disparities via worst-group F1 and average absolute gap. Our analysis reveals three key findings. First, cross-lingual transfer that improves overall F1 may increase error disparities for specific communities, especially minority or politically sensitive groups. Second, translate-to-English pipelines systematically over-flag some religious and political groups compared to native-script models. Third, a simple group-aware reweighting scheme can substantially reduce worst-group error without sacrificing average performance. We conclude with recommendations for evaluating and mitigating unfairness when deploying cross-lingual bias detectors in Hindi and other low-resource languages.

Keywords: cross-lingual bias detection, multilingual transformers, group fairness analysis, Hindi social media moderation, bias mitigation strategies
Sonia Shahzadi
Indian Institute of Technology (IIT) Delhi
Sanjiv Kumar
Indian Institute of Technology (IIT) Delhi

DOI

Cite this article as:

Sonia Shahzadi, Sanjiv Kumar. Who Gets Protected? A Fairness Analysis of Cross-Lingual Social Bias Detection for Hindi. Bulletin of Computer and Data Sciences, Volume 5 Issue 2. Page: 1-13.

Publication history

Copyright © 2024 Sonia Shahzadi, Sanjiv Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search