Lexicon-Guided Morphological Tag Injection for Low-Resource Filipino-Cebuano Neural Machine Translation

KRISTINE MAE ADLAON, Nelson Marcos
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:916-923, 2026.

Abstract

Neural Machine Translation (NMT) remains difficult for low-resource languages, especially those with complex word formation systems. This work focuses on the Filipino– Cebuano language pair, where verbs encode voice and aspect using different morphological patterns. Although the two languages are closely related, their distinct verb formation strategies often create ambiguity and mismatches during translation, leading to errors in predicate interpretation and grammatical alignment. Pretrained multilingual models such as NLLB-200 provide broad language coverage, but they frequently struggle with predicate-level accuracy in closely related Philippine languages due to insufficient explicit morphological grounding. We propose a lexicon-guided morphological tag injection framework that enriches source-side input with structured linguistic cues, including aspect and voice markers derived from a curated morphological lexicon. Rather than modifying the model architecture or introducing new token embeddings, we inject morphological metadata directly into the input sequence and perform parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA). Experimental results show consistent improvements over baseline fine-tuning, particularly in constructions involving complex verbal morphology and one-to-many or many-to-one lexical mappings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-adlaon26a, title = {Lexicon-Guided Morphological Tag Injection for Low-Resource Filipino-Cebuano Neural Machine Translation}, author = {ADLAON, KRISTINE MAE and Marcos, Nelson}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {916--923}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/adlaon26a/adlaon26a.pdf}, url = {https://proceedings.mlr.press/v318/adlaon26a.html}, abstract = {Neural Machine Translation (NMT) remains difficult for low-resource languages, especially those with complex word formation systems. This work focuses on the Filipino– Cebuano language pair, where verbs encode voice and aspect using different morphological patterns. Although the two languages are closely related, their distinct verb formation strategies often create ambiguity and mismatches during translation, leading to errors in predicate interpretation and grammatical alignment. Pretrained multilingual models such as NLLB-200 provide broad language coverage, but they frequently struggle with predicate-level accuracy in closely related Philippine languages due to insufficient explicit morphological grounding. We propose a lexicon-guided morphological tag injection framework that enriches source-side input with structured linguistic cues, including aspect and voice markers derived from a curated morphological lexicon. Rather than modifying the model architecture or introducing new token embeddings, we inject morphological metadata directly into the input sequence and perform parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA). Experimental results show consistent improvements over baseline fine-tuning, particularly in constructions involving complex verbal morphology and one-to-many or many-to-one lexical mappings.} }
Endnote
%0 Conference Paper %T Lexicon-Guided Morphological Tag Injection for Low-Resource Filipino-Cebuano Neural Machine Translation %A KRISTINE MAE ADLAON %A Nelson Marcos %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-adlaon26a %I PMLR %P 916--923 %U https://proceedings.mlr.press/v318/adlaon26a.html %V 318 %X Neural Machine Translation (NMT) remains difficult for low-resource languages, especially those with complex word formation systems. This work focuses on the Filipino– Cebuano language pair, where verbs encode voice and aspect using different morphological patterns. Although the two languages are closely related, their distinct verb formation strategies often create ambiguity and mismatches during translation, leading to errors in predicate interpretation and grammatical alignment. Pretrained multilingual models such as NLLB-200 provide broad language coverage, but they frequently struggle with predicate-level accuracy in closely related Philippine languages due to insufficient explicit morphological grounding. We propose a lexicon-guided morphological tag injection framework that enriches source-side input with structured linguistic cues, including aspect and voice markers derived from a curated morphological lexicon. Rather than modifying the model architecture or introducing new token embeddings, we inject morphological metadata directly into the input sequence and perform parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA). Experimental results show consistent improvements over baseline fine-tuning, particularly in constructions involving complex verbal morphology and one-to-many or many-to-one lexical mappings.
APA
ADLAON, K.M. & Marcos, N.. (2026). Lexicon-Guided Morphological Tag Injection for Low-Resource Filipino-Cebuano Neural Machine Translation. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:916-923 Available from https://proceedings.mlr.press/v318/adlaon26a.html.

Related Material