Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations

Lily H Zhang, Hamid Dadkhahi, Mara Finkelstein, Firas Trabelsi, Jiaming Luo, Markus Freitag
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74787-74803, 2025.

Abstract

Despite growing interest in incorporating feedback to improve language models, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. We develop a simple finetuning algorithm, called Training with Annotations (TWA), to directly train machine translation models on such annotated data. TWA utilizes targeted span-level error information while also flexibly learning what to penalize within a span. Moreover, TWA considers the overall trajectory of a sequence when deciding which non-error spans to utilize as positive signals. Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as supervised finetuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25p, title = {Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations}, author = {Zhang, Lily H and Dadkhahi, Hamid and Finkelstein, Mara and Trabelsi, Firas and Luo, Jiaming and Freitag, Markus}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {74787--74803}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25p/zhang25p.pdf}, url = {https://proceedings.mlr.press/v267/zhang25p.html}, abstract = {Despite growing interest in incorporating feedback to improve language models, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. We develop a simple finetuning algorithm, called Training with Annotations (TWA), to directly train machine translation models on such annotated data. TWA utilizes targeted span-level error information while also flexibly learning what to penalize within a span. Moreover, TWA considers the overall trajectory of a sequence when deciding which non-error spans to utilize as positive signals. Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as supervised finetuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data.} }
Endnote
%0 Conference Paper %T Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations %A Lily H Zhang %A Hamid Dadkhahi %A Mara Finkelstein %A Firas Trabelsi %A Jiaming Luo %A Markus Freitag %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25p %I PMLR %P 74787--74803 %U https://proceedings.mlr.press/v267/zhang25p.html %V 267 %X Despite growing interest in incorporating feedback to improve language models, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. We develop a simple finetuning algorithm, called Training with Annotations (TWA), to directly train machine translation models on such annotated data. TWA utilizes targeted span-level error information while also flexibly learning what to penalize within a span. Moreover, TWA considers the overall trajectory of a sequence when deciding which non-error spans to utilize as positive signals. Experiments on English-German and Chinese-English machine translation show that TWA outperforms baselines such as supervised finetuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data.
APA
Zhang, L.H., Dadkhahi, H., Finkelstein, M., Trabelsi, F., Luo, J. & Freitag, M.. (2025). Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74787-74803 Available from https://proceedings.mlr.press/v267/zhang25p.html.

Related Material