Reflection-Window Decoding: Text Generation with Selective Refinement

Zeyu Tang, Zhenhao Chen, Xiangchen Song, Loka Li, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:58739-58764, 2025.

Abstract

The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-tang25a, title = {Reflection-Window Decoding: Text Generation with Selective Refinement}, author = {Tang, Zeyu and Chen, Zhenhao and Song, Xiangchen and Li, Loka and Deng, Yunlong and Shen, Yifan and Chen, Guangyi and Spirtes, Peter and Zhang, Kun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {58739--58764}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/tang25a/tang25a.pdf}, url = {https://proceedings.mlr.press/v267/tang25a.html}, abstract = {The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.} }
Endnote
%0 Conference Paper %T Reflection-Window Decoding: Text Generation with Selective Refinement %A Zeyu Tang %A Zhenhao Chen %A Xiangchen Song %A Loka Li %A Yunlong Deng %A Yifan Shen %A Guangyi Chen %A Peter Spirtes %A Kun Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-tang25a %I PMLR %P 58739--58764 %U https://proceedings.mlr.press/v267/tang25a.html %V 267 %X The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.
APA
Tang, Z., Chen, Z., Song, X., Li, L., Deng, Y., Shen, Y., Chen, G., Spirtes, P. & Zhang, K.. (2025). Reflection-Window Decoding: Text Generation with Selective Refinement. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:58739-58764 Available from https://proceedings.mlr.press/v267/tang25a.html.

Related Material