Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, Yilun Du, Joshua Tenenbaum, Sergey Levine
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:9902-9915, 2022.

Abstract

Model-based reinforcement learning methods often use learning only for the purpose of recovering an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-janner22a, title = {Planning with Diffusion for Flexible Behavior Synthesis}, author = {Janner, Michael and Du, Yilun and Tenenbaum, Joshua and Levine, Sergey}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {9902--9915}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/janner22a/janner22a.pdf}, url = {https://proceedings.mlr.press/v162/janner22a.html}, abstract = {Model-based reinforcement learning methods often use learning only for the purpose of recovering an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.} }
Endnote
%0 Conference Paper %T Planning with Diffusion for Flexible Behavior Synthesis %A Michael Janner %A Yilun Du %A Joshua Tenenbaum %A Sergey Levine %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-janner22a %I PMLR %P 9902--9915 %U https://proceedings.mlr.press/v162/janner22a.html %V 162 %X Model-based reinforcement learning methods often use learning only for the purpose of recovering an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.
APA
Janner, M., Du, Y., Tenenbaum, J. & Levine, S.. (2022). Planning with Diffusion for Flexible Behavior Synthesis. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:9902-9915 Available from https://proceedings.mlr.press/v162/janner22a.html.

Related Material