Rejecting Hallucinated State Targets during Planning

Harry Zhao, Tristan Sylvain, Romain Laroche, Doina Precup, Yoshua Bengio
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:77677-77702, 2025.

Abstract

In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "targets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent’s interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhao25t, title = {Rejecting Hallucinated State Targets during Planning}, author = {Zhao, Harry and Sylvain, Tristan and Laroche, Romain and Precup, Doina and Bengio, Yoshua}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {77677--77702}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhao25t/zhao25t.pdf}, url = {https://proceedings.mlr.press/v267/zhao25t.html}, abstract = {In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "targets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent’s interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.} }
Endnote
%0 Conference Paper %T Rejecting Hallucinated State Targets during Planning %A Harry Zhao %A Tristan Sylvain %A Romain Laroche %A Doina Precup %A Yoshua Bengio %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhao25t %I PMLR %P 77677--77702 %U https://proceedings.mlr.press/v267/zhao25t.html %V 267 %X In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "targets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent’s interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.
APA
Zhao, H., Sylvain, T., Laroche, R., Precup, D. & Bengio, Y.. (2025). Rejecting Hallucinated State Targets during Planning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:77677-77702 Available from https://proceedings.mlr.press/v267/zhao25t.html.

Related Material