Automatic social bias detection is increasingly deployed to moderate harmful content on social media, often in settings where training data for low-resource languages is scarce. Recent work shows that multilingual transformers fine-tuned on high-resource languages can be adapted to detect biased content in Hindi with strong overall F1 scores. However, little is known about how such cross-lingual bias detectors behave across different social groups: do they protect all communities equally, or do some groups experience systematically higher false positives or false negatives? In this paper, we present a group-level fairness analysis of cross-lingual social bias detection for Hindi. Building on a Hindi social bias dataset annotated with bias labels, categories (e.g., religion, politics, caste, occupation), targets, and sentiment, we derive a set of group indicators for religious communities, political actors, and caste-related mentions. We then compare several training regimes for XLM-R: (i) Hindi-only training, (ii) sequential English\(\rightarrow\)Hindi fine-tuning, (iii) joint English+Hindi training, and (iv) a translate-to-English pipeline. For each setup, we report both global metrics and group-wise error rates (true positive rate, false positive rate, false negative rate) and summarize disparities via worst-group F1 and average absolute gap. Our analysis reveals three key findings. First, cross-lingual transfer that improves overall F1 may increase error disparities for specific communities, especially minority or politically sensitive groups. Second, translate-to-English pipelines systematically over-flag some religious and political groups compared to native-script models. Third, a simple group-aware reweighting scheme can substantially reduce worst-group error without sacrificing average performance. We conclude with recommendations for evaluating and mitigating unfairness when deploying cross-lingual bias detectors in Hindi and other low-resource languages.
Automatic social bias detection is increasingly deployed to moderate harmful content on social media, often in settings where training data for low-resource languages is scarce. Recent work shows that multilingual transformers fine-tuned on high-resource languages can be adapted to detect biased content in Hindi with strong overall F1 scores. However, little is known about how such cross-lingual bias detectors behave across different social groups: do they protect all communities equally, or do some groups experience systematically higher false positives or false negatives? In this paper, we present a group-level fairness analysis of cross-lingual social bias detection for Hindi. Building on a Hindi social bias dataset annotated with bias labels, categories (e.g., religion, politics, caste, occupation), targets, and sentiment, we derive a set of group indicators for religious communities, political actors, and caste-related mentions. We then compare several training regimes for XLM-R: (i) Hindi-only training, (ii) sequential English\(\rightarrow\)Hindi fine-tuning, (iii) joint English+Hindi training, and (iv) a translate-to-English pipeline. For each setup, we report both global metrics and group-wise error rates (true positive rate, false positive rate, false negative rate) and summarize disparities via worst-group F1 and average absolute gap. Our analysis reveals three key findings. First, cross-lingual transfer that improves overall F1 may increase error disparities for specific communities, especially minority or politically sensitive groups. Second, translate-to-English pipelines systematically over-flag some religious and political groups compared to native-script models. Third, a simple group-aware reweighting scheme can substantially reduce worst-group error without sacrificing average performance. We conclude with recommendations for evaluating and mitigating unfairness when deploying cross-lingual bias detectors in Hindi and other low-resource languages.