Overcoming Non-monotonicity in Transducer-based Streaming Generation

Zhengrui Ma, Yang Feng, Min Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:41890-41906, 2025.

Abstract

Streaming generation models are utilized across fields, with the Transducer architecture being popular in industrial applications. However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation. In this research, we address this issue by integrating Transducer’s decoding with the history of input stream via a learnable monotonic attention. Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps, which is then used to estimate the monotonic context representations, thereby avoiding the need to enumerate the exponentially large alignment space during training. Extensive experiments show that our MonoAttn-Transducer effectively handles non-monotonic alignments in streaming scenarios, offering a robust solution for complex generation tasks. Code is available at https://github.com/ictnlp/MonoAttn-Transducer.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-ma25f, title = {Overcoming Non-monotonicity in Transducer-based Streaming Generation}, author = {Ma, Zhengrui and Feng, Yang and Zhang, Min}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {41890--41906}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ma25f/ma25f.pdf}, url = {https://proceedings.mlr.press/v267/ma25f.html}, abstract = {Streaming generation models are utilized across fields, with the Transducer architecture being popular in industrial applications. However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation. In this research, we address this issue by integrating Transducer’s decoding with the history of input stream via a learnable monotonic attention. Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps, which is then used to estimate the monotonic context representations, thereby avoiding the need to enumerate the exponentially large alignment space during training. Extensive experiments show that our MonoAttn-Transducer effectively handles non-monotonic alignments in streaming scenarios, offering a robust solution for complex generation tasks. Code is available at https://github.com/ictnlp/MonoAttn-Transducer.} }
Endnote
%0 Conference Paper %T Overcoming Non-monotonicity in Transducer-based Streaming Generation %A Zhengrui Ma %A Yang Feng %A Min Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-ma25f %I PMLR %P 41890--41906 %U https://proceedings.mlr.press/v267/ma25f.html %V 267 %X Streaming generation models are utilized across fields, with the Transducer architecture being popular in industrial applications. However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation. In this research, we address this issue by integrating Transducer’s decoding with the history of input stream via a learnable monotonic attention. Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps, which is then used to estimate the monotonic context representations, thereby avoiding the need to enumerate the exponentially large alignment space during training. Extensive experiments show that our MonoAttn-Transducer effectively handles non-monotonic alignments in streaming scenarios, offering a robust solution for complex generation tasks. Code is available at https://github.com/ictnlp/MonoAttn-Transducer.
APA
Ma, Z., Feng, Y. & Zhang, M.. (2025). Overcoming Non-monotonicity in Transducer-based Streaming Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:41890-41906 Available from https://proceedings.mlr.press/v267/ma25f.html.

Related Material