MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:26087-26105, 2023.

Abstract

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-ni23a, title = {{M}eta{D}iffuser: Diffusion Model as Conditional Planner for Offline Meta-{RL}}, author = {Ni, Fei and Hao, Jianye and Mu, Yao and Yuan, Yifu and Zheng, Yan and Wang, Bin and Liang, Zhixuan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {26087--26105}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/ni23a/ni23a.pdf}, url = {https://proceedings.mlr.press/v202/ni23a.html}, abstract = {Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.} }
Endnote
%0 Conference Paper %T MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL %A Fei Ni %A Jianye Hao %A Yao Mu %A Yifu Yuan %A Yan Zheng %A Bin Wang %A Zhixuan Liang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-ni23a %I PMLR %P 26087--26105 %U https://proceedings.mlr.press/v202/ni23a.html %V 202 %X Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.
APA
Ni, F., Hao, J., Mu, Y., Yuan, Y., Zheng, Y., Wang, B. & Liang, Z.. (2023). MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:26087-26105 Available from https://proceedings.mlr.press/v202/ni23a.html.

Related Material