Tonative: Community-Driven Extension of African Datasets Through Human-AI Collaboration

Cynthia Jayne Amol, Sharon Ibejih
Proceedings of the AI for African Languages Conference 2025, PMLR 314:33-36, 2026.

Abstract

Sustainable creation of language resources for African languages remains a major challenge, leaving many languages severely low-resource. While community-driven approaches are effective, they are difficult to scale, and purely synthetic data risks introducing translation artifacts and bias. This paper presents Tonative, a human-AI collaborative framework that extends existing datasets by translating them into additional African languages. The approach combines automated translation with community-based validation, reducing human workload while preserving linguistic authenticity. The proposed framework supports more sustainable and scalable development of African language resources.

Cite this Paper


BibTeX
@InProceedings{pmlr-v314-amol26a, title = {Tonative: Community-Driven Extension of African Datasets Through Human-AI Collaboration}, author = {Amol, Cynthia Jayne and Ibejih, Sharon}, booktitle = {Proceedings of the AI for African Languages Conference 2025}, pages = {33--36}, year = {2026}, editor = {Bainomugisha, Engineer and Mwebaze, Ernest and Kimera, Richard and Nabende, Joyce Nakatumba and Katumba, Andrew and Quinn, John}, volume = {314}, series = {Proceedings of Machine Learning Research}, month = {10 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v314/main/assets/amol26a/amol26a.pdf}, url = {https://proceedings.mlr.press/v314/amol26a.html}, abstract = {Sustainable creation of language resources for African languages remains a major challenge, leaving many languages severely low-resource. While community-driven approaches are effective, they are difficult to scale, and purely synthetic data risks introducing translation artifacts and bias. This paper presents Tonative, a human-AI collaborative framework that extends existing datasets by translating them into additional African languages. The approach combines automated translation with community-based validation, reducing human workload while preserving linguistic authenticity. The proposed framework supports more sustainable and scalable development of African language resources.} }
Endnote
%0 Conference Paper %T Tonative: Community-Driven Extension of African Datasets Through Human-AI Collaboration %A Cynthia Jayne Amol %A Sharon Ibejih %B Proceedings of the AI for African Languages Conference 2025 %C Proceedings of Machine Learning Research %D 2026 %E Engineer Bainomugisha %E Ernest Mwebaze %E Richard Kimera %E Joyce Nakatumba Nabende %E Andrew Katumba %E John Quinn %F pmlr-v314-amol26a %I PMLR %P 33--36 %U https://proceedings.mlr.press/v314/amol26a.html %V 314 %X Sustainable creation of language resources for African languages remains a major challenge, leaving many languages severely low-resource. While community-driven approaches are effective, they are difficult to scale, and purely synthetic data risks introducing translation artifacts and bias. This paper presents Tonative, a human-AI collaborative framework that extends existing datasets by translating them into additional African languages. The approach combines automated translation with community-based validation, reducing human workload while preserving linguistic authenticity. The proposed framework supports more sustainable and scalable development of African language resources.
APA
Amol, C.J. & Ibejih, S.. (2026). Tonative: Community-Driven Extension of African Datasets Through Human-AI Collaboration. Proceedings of the AI for African Languages Conference 2025, in Proceedings of Machine Learning Research 314:33-36 Available from https://proceedings.mlr.press/v314/amol26a.html.

Related Material