[edit]
LinguaTriage: Cross-Lingual Transfer and African Language Pretraining for Low-Resource Medical Triage in Lingala
Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments, PMLR 319:87-100, 2026.
Abstract
We introduce LinguaTriage, the first medical triage classification system for Lingala, a Bantu language of Central Africa spoken by over 45 million people with no prior supervised NLP benchmarks. Working from a 616-sample dataset of annotated symptom descriptions across three urgency levels, we develop a targeted augmentation pipeline and evaluate three architectures: fine-tuned XLM-RoBERTa (XLM-RFT), a two-stage cross-lingual transfer system (XLM-RCL), and fine-tuned AfriBERTa-Large (AfriBERTaFT). AfriBERTaFT achieves macro-F1 of 0.974 and perfect Emergency recall (1.00) on the internal test set. Mixing just 100 in-domain examples into training improves external accuracy from near-chance to 79%, demonstrating that minimal target-domain exposure far outweighs architectural choices for generalisation.