Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil

Margrit Betke1, RC Wilson1
1Department of Computer Science, Boston University
DOI: https://doi.org/10.71448/bcds2231-2
Published: 30/06/2022
Cite this article as: Margrit Betke, RC Wilson. Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil. Bulletin of Computer and Data Sciences, Volume 3 Issue 1. Page: 11-21.

Abstract

Recent advances in multilingual news framing analysis have shown promise for low-resource settings through code-switching techniques. However, existing evaluations have focused on relatively high-resource languages like German, Turkish, and Arabic. This paper extends this line of research to genuinely low-resource languages—Swahili and Tamil—that face significant representation gaps in existing multilingual models. We introduce new annotated datasets for gun violence framing in these languages and systematically evaluate the code-switching approach under extreme low-resource conditions. Our results show that while code-switching provides consistent improvements over zero-shot transfer, the absolute performance gap between high-resource and genuinely low-resource languages remains substantial (15-20% F1-macro). We identify linguistic distance and morphological complexity as key challenges and propose adaptations to the code-switching method that yield 7% average improvement. Our work provides the first comprehensive analysis of cross-lingual frame detection in truly low-resource scenarios and establishes benchmarks for future research.

Keywords: Low-resource NLP, Code-switching for framing, Cross-lingual frame detection, Swahili and Tamil news, Multilingual language models

Abstract

Recent advances in multilingual news framing analysis have shown promise for low-resource settings through code-switching techniques. However, existing evaluations have focused on relatively high-resource languages like German, Turkish, and Arabic. This paper extends this line of research to genuinely low-resource languages—Swahili and Tamil—that face significant representation gaps in existing multilingual models. We introduce new annotated datasets for gun violence framing in these languages and systematically evaluate the code-switching approach under extreme low-resource conditions. Our results show that while code-switching provides consistent improvements over zero-shot transfer, the absolute performance gap between high-resource and genuinely low-resource languages remains substantial (15-20% F1-macro). We identify linguistic distance and morphological complexity as key challenges and propose adaptations to the code-switching method that yield 7% average improvement. Our work provides the first comprehensive analysis of cross-lingual frame detection in truly low-resource scenarios and establishes benchmarks for future research.

Keywords: Low-resource NLP, Code-switching for framing, Cross-lingual frame detection, Swahili and Tamil news, Multilingual language models
Margrit Betke
Department of Computer Science, Boston University
RC Wilson
Department of Computer Science, Boston University

DOI

Cite this article as:

Margrit Betke, RC Wilson. Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil. Bulletin of Computer and Data Sciences, Volume 3 Issue 1. Page: 11-21.

Publication history

Copyright © 2022 Margrit Betke, RC Wilson. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search