How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language?

Flora Oladipupo, Anthony Soronnadi, Ife Adebara, Olubayo Adekanmbi
Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, PMLR 296:84-89, 2025.

Abstract

This research explores the challenges and limitations of applying deep learning models to the translation of scientific texts from English to Nigerian Pidgin, a widely spoken but low-resource language in West Africa. Despite advancements in machine translation, translating domain-specific content such as biological research papers presents unique obstacles, including data scarcity, linguistic complexity, and model generalization issues. We investigate the performance of AI models, including Pidgin-UNMT, mt5-base model, AfriTeVa base, Afri-mt5 base model and GPT 4.0 model through a comparative analysis using BLEU scores, CHRF, TER, Africomet metrics on a newly created Eng-PidginBioData dataset of biological texts. Our findings reveal significant gaps in model performance, emphasizing the need for more domain-specific fine-tuning, improved dataset creation, and collaboration with native speakers to enhance translation accuracy. By presenting real-world challenges encountered in applying deep learning to low-resource languages this research suggests strategies to overcome these barriers. Our study provides valuable insights into the persistent challenges faced by AI-driven translation systems, from limited data to domain mismatches, and highlights ways to enhance their effectiveness for underrepresented languages. By addressing these constraints, we offer actionable strategies for more inclusive and impactful scientific knowledge dissemination.

Cite this Paper


BibTeX
@InProceedings{pmlr-v296-oladipupo25a, title = {How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language?}, author = {Oladipupo, Flora and Soronnadi, Anthony and Adebara, Ife and Adekanmbi, Olubayo}, booktitle = {Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops}, pages = {84--89}, year = {2025}, editor = {Blaas, Arno and D’Costa, Priya and Feng, Fan and Kriegler, Andreas and Mason, Ian and Pan, Zhaoying and Uelwer, Tobias and Williams, Jennifer and Xie, Yubin and Yang, Rui}, volume = {296}, series = {Proceedings of Machine Learning Research}, month = {28 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v296/main/assets/oladipupo25a/oladipupo25a.pdf}, url = {https://proceedings.mlr.press/v296/oladipupo25a.html}, abstract = {This research explores the challenges and limitations of applying deep learning models to the translation of scientific texts from English to Nigerian Pidgin, a widely spoken but low-resource language in West Africa. Despite advancements in machine translation, translating domain-specific content such as biological research papers presents unique obstacles, including data scarcity, linguistic complexity, and model generalization issues. We investigate the performance of AI models, including Pidgin-UNMT, mt5-base model, AfriTeVa base, Afri-mt5 base model and GPT 4.0 model through a comparative analysis using BLEU scores, CHRF, TER, Africomet metrics on a newly created Eng-PidginBioData dataset of biological texts. Our findings reveal significant gaps in model performance, emphasizing the need for more domain-specific fine-tuning, improved dataset creation, and collaboration with native speakers to enhance translation accuracy. By presenting real-world challenges encountered in applying deep learning to low-resource languages this research suggests strategies to overcome these barriers. Our study provides valuable insights into the persistent challenges faced by AI-driven translation systems, from limited data to domain mismatches, and highlights ways to enhance their effectiveness for underrepresented languages. By addressing these constraints, we offer actionable strategies for more inclusive and impactful scientific knowledge dissemination.} }
Endnote
%0 Conference Paper %T How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language? %A Flora Oladipupo %A Anthony Soronnadi %A Ife Adebara %A Olubayo Adekanmbi %B Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops %C Proceedings of Machine Learning Research %D 2025 %E Arno Blaas %E Priya D’Costa %E Fan Feng %E Andreas Kriegler %E Ian Mason %E Zhaoying Pan %E Tobias Uelwer %E Jennifer Williams %E Yubin Xie %E Rui Yang %F pmlr-v296-oladipupo25a %I PMLR %P 84--89 %U https://proceedings.mlr.press/v296/oladipupo25a.html %V 296 %X This research explores the challenges and limitations of applying deep learning models to the translation of scientific texts from English to Nigerian Pidgin, a widely spoken but low-resource language in West Africa. Despite advancements in machine translation, translating domain-specific content such as biological research papers presents unique obstacles, including data scarcity, linguistic complexity, and model generalization issues. We investigate the performance of AI models, including Pidgin-UNMT, mt5-base model, AfriTeVa base, Afri-mt5 base model and GPT 4.0 model through a comparative analysis using BLEU scores, CHRF, TER, Africomet metrics on a newly created Eng-PidginBioData dataset of biological texts. Our findings reveal significant gaps in model performance, emphasizing the need for more domain-specific fine-tuning, improved dataset creation, and collaboration with native speakers to enhance translation accuracy. By presenting real-world challenges encountered in applying deep learning to low-resource languages this research suggests strategies to overcome these barriers. Our study provides valuable insights into the persistent challenges faced by AI-driven translation systems, from limited data to domain mismatches, and highlights ways to enhance their effectiveness for underrepresented languages. By addressing these constraints, we offer actionable strategies for more inclusive and impactful scientific knowledge dissemination.
APA
Oladipupo, F., Soronnadi, A., Adebara, I. & Adekanmbi, O.. (2025). How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language?. Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, in Proceedings of Machine Learning Research 296:84-89 Available from https://proceedings.mlr.press/v296/oladipupo25a.html.

Related Material