The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39171-39189, 2025.

Abstract

Off-policy deep reinforcement learning (RL) agents typically leverage replay buffers for reusing past experiences during learning. This can help sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it has the effect of “polluting” the replay buffer with data that can exacerbate optimization challenges in addition to wasting environment interactions due to redundant sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose the learn to stop (LEAST) mechanism which uses statistics based on $Q$-values and gradient to guide early episode termination which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25ay, title = {The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning}, author = {Liu, Jiashun and Obando-Ceron, Johan and Castro, Pablo Samuel and Courville, Aaron and Pan, Ling}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {39171--39189}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25ay/liu25ay.pdf}, url = {https://proceedings.mlr.press/v267/liu25ay.html}, abstract = {Off-policy deep reinforcement learning (RL) agents typically leverage replay buffers for reusing past experiences during learning. This can help sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it has the effect of “polluting” the replay buffer with data that can exacerbate optimization challenges in addition to wasting environment interactions due to redundant sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose the learn to stop (LEAST) mechanism which uses statistics based on $Q$-values and gradient to guide early episode termination which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.} }
Endnote
%0 Conference Paper %T The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning %A Jiashun Liu %A Johan Obando-Ceron %A Pablo Samuel Castro %A Aaron Courville %A Ling Pan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25ay %I PMLR %P 39171--39189 %U https://proceedings.mlr.press/v267/liu25ay.html %V 267 %X Off-policy deep reinforcement learning (RL) agents typically leverage replay buffers for reusing past experiences during learning. This can help sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it has the effect of “polluting” the replay buffer with data that can exacerbate optimization challenges in addition to wasting environment interactions due to redundant sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose the learn to stop (LEAST) mechanism which uses statistics based on $Q$-values and gradient to guide early episode termination which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.
APA
Liu, J., Obando-Ceron, J., Castro, P.S., Courville, A. & Pan, L.. (2025). The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:39171-39189 Available from https://proceedings.mlr.press/v267/liu25ay.html.

Related Material