Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

David Basil, Chirooth Girigowda, Bradley Hauer, Grzegorz Kondrak, Sahir Momin, Ning Shi
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1036-1043, 2026.

Abstract

We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects the annotated synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate alignments and ensure their quality, we augment a pretrained base aligner with a bilingual dictionary, which is also used to filter incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and resource-efficient. We release our code, documentation, and generated sense inventories at https://github.com/UAlberta-NLP/ExpandNet.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-basil26a, title = {Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection}, author = {Basil, David and Girigowda, Chirooth and Hauer, Bradley and Kondrak, Grzegorz and Momin, Sahir and Shi, Ning}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1036--1043}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/basil26a/basil26a.pdf}, url = {https://proceedings.mlr.press/v318/basil26a.html}, abstract = {We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects the annotated synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate alignments and ensure their quality, we augment a pretrained base aligner with a bilingual dictionary, which is also used to filter incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and resource-efficient. We release our code, documentation, and generated sense inventories at https://github.com/UAlberta-NLP/ExpandNet.} }
Endnote
%0 Conference Paper %T Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection %A David Basil %A Chirooth Girigowda %A Bradley Hauer %A Grzegorz Kondrak %A Sahir Momin %A Ning Shi %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-basil26a %I PMLR %P 1036--1043 %U https://proceedings.mlr.press/v318/basil26a.html %V 318 %X We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects the annotated synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate alignments and ensure their quality, we augment a pretrained base aligner with a bilingual dictionary, which is also used to filter incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and resource-efficient. We release our code, documentation, and generated sense inventories at https://github.com/UAlberta-NLP/ExpandNet.
APA
Basil, D., Girigowda, C., Hauer, B., Kondrak, G., Momin, S. & Shi, N.. (2026). Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1036-1043 Available from https://proceedings.mlr.press/v318/basil26a.html.

Related Material