Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil

Research article

Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil

^¹, ^¹

¹Department of Computer Science, Boston University

Volume 3 Issue 1

DOI: https://doi.org/10.71448/bcds2231-2

Published: 30/06/2022

Cite this article as: Margrit Betke, RC Wilson. Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil. Bulletin of Computer and Data Sciences, Volume 3 Issue 1. Page: 11-21.

Abstract

Recent advances in multilingual news framing analysis have shown promise for low-resource settings through code-switching techniques. However, existing evaluations have focused on relatively high-resource languages like German, Turkish, and Arabic. This paper extends this line of research to genuinely low-resource languages—Swahili and Tamil—that face significant representation gaps in existing multilingual models. We introduce new annotated datasets for gun violence framing in these languages and systematically evaluate the code-switching approach under extreme low-resource conditions. Our results show that while code-switching provides consistent improvements over zero-shot transfer, the absolute performance gap between high-resource and genuinely low-resource languages remains substantial (15-20% F1-macro). We identify linguistic distance and morphological complexity as key challenges and propose adaptations to the code-switching method that yield 7% average improvement. Our work provides the first comprehensive analysis of cross-lingual frame detection in truly low-resource scenarios and establishes benchmarks for future research.

Keywords: Low-resource NLP, Code-switching for framing, Cross-lingual frame detection, Swahili and Tamil news, Multilingual language models

Abstract

Keywords: Low-resource NLP, Code-switching for framing, Cross-lingual frame detection, Swahili and Tamil news, Multilingual language models

Margrit Betke

Department of Computer Science, Boston University

betke@bu.edu

RC Wilson

Department of Computer Science, Boston University

DOI

https://doi.org/10.71448/bcds2231-2

Cite this article as:

Margrit Betke, RC Wilson. Scaling Frame Analysis to Genuinely Low-Resource Languages: A Case Study in Swahili and Tamil. Bulletin of Computer and Data Sciences, Volume 3 Issue 1. Page: 11-21.

Publication history

Received: 20/01/2022
Revised: 12/04/2022
Accepted: 20/05/2022
Published: 30/06/2022

Copyright © 2022 Margrit Betke, RC Wilson. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.