Letting Uncertainty Guide Your Multimodal Machine Translation

Wuyi Liu, Yue Gao, Yige Mao, Jing Zhao
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:2701-2710, 2025.

Abstract

Multimodal Machine Translation (MMT) leverages additional modalities, such as visual data, to enhance translation accuracy and resolve linguistic ambiguities inherent in text-only approaches. Recent advancements predominantly focus on integrating image information via attention mechanisms or feature fusion techniques. However, current approaches lack explicit mechanisms to quantify and manage the uncertainty during translation process, resulting in the utilization of image information being a black box. This makes it difficult to effectively address the issues of incomplete utilization of visual information and even potential degradation of translation quality when using visual information.To address these challenges, we introduce a novel Uncertainty-Guided Multimodal Machine Translation (UG-MMT) framework that redefines how translation systems handle ambiguity through systematic uncertainty reduction. Designed with plug-and-play flexibility, our framework enables seamless integration into existing MMT systems, requiring minimal modification while delivering significant performance gains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-liu25c, title = {Letting Uncertainty Guide Your Multimodal Machine Translation}, author = {Liu, Wuyi and Gao, Yue and Mao, Yige and Zhao, Jing}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {2701--2710}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/liu25c/liu25c.pdf}, url = {https://proceedings.mlr.press/v286/liu25c.html}, abstract = {Multimodal Machine Translation (MMT) leverages additional modalities, such as visual data, to enhance translation accuracy and resolve linguistic ambiguities inherent in text-only approaches. Recent advancements predominantly focus on integrating image information via attention mechanisms or feature fusion techniques. However, current approaches lack explicit mechanisms to quantify and manage the uncertainty during translation process, resulting in the utilization of image information being a black box. This makes it difficult to effectively address the issues of incomplete utilization of visual information and even potential degradation of translation quality when using visual information.To address these challenges, we introduce a novel Uncertainty-Guided Multimodal Machine Translation (UG-MMT) framework that redefines how translation systems handle ambiguity through systematic uncertainty reduction. Designed with plug-and-play flexibility, our framework enables seamless integration into existing MMT systems, requiring minimal modification while delivering significant performance gains.} }
Endnote
%0 Conference Paper %T Letting Uncertainty Guide Your Multimodal Machine Translation %A Wuyi Liu %A Yue Gao %A Yige Mao %A Jing Zhao %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-liu25c %I PMLR %P 2701--2710 %U https://proceedings.mlr.press/v286/liu25c.html %V 286 %X Multimodal Machine Translation (MMT) leverages additional modalities, such as visual data, to enhance translation accuracy and resolve linguistic ambiguities inherent in text-only approaches. Recent advancements predominantly focus on integrating image information via attention mechanisms or feature fusion techniques. However, current approaches lack explicit mechanisms to quantify and manage the uncertainty during translation process, resulting in the utilization of image information being a black box. This makes it difficult to effectively address the issues of incomplete utilization of visual information and even potential degradation of translation quality when using visual information.To address these challenges, we introduce a novel Uncertainty-Guided Multimodal Machine Translation (UG-MMT) framework that redefines how translation systems handle ambiguity through systematic uncertainty reduction. Designed with plug-and-play flexibility, our framework enables seamless integration into existing MMT systems, requiring minimal modification while delivering significant performance gains.
APA
Liu, W., Gao, Y., Mao, Y. & Zhao, J.. (2025). Letting Uncertainty Guide Your Multimodal Machine Translation. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:2701-2710 Available from https://proceedings.mlr.press/v286/liu25c.html.

Related Material