Temporal Difference Learning for Model Predictive Control

Nicklas A Hansen; Hao Su; Xiaolong Wang

Temporal Difference Learning for Model Predictive Control

Nicklas A Hansen, Hao Su, Xiaolong Wang

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8387-8406, 2022.

Abstract

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and videos are available at https://nicklashansen.github.io/td-mpc.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-hansen22a,
  title = 	 {Temporal Difference Learning for Model Predictive Control},
  author =       {Hansen, Nicklas A and Su, Hao and Wang, Xiaolong},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {8387--8406},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/hansen22a/hansen22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/hansen22a.html},
  abstract = 	 {Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and videos are available at https://nicklashansen.github.io/td-mpc.}
}

Endnote

%0 Conference Paper
%T Temporal Difference Learning for Model Predictive Control
%A Nicklas A Hansen
%A Hao Su
%A Xiaolong Wang
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-hansen22a
%I PMLR
%P 8387--8406
%U https://proceedings.mlr.press/v162/hansen22a.html
%V 162
%X Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and videos are available at https://nicklashansen.github.io/td-mpc.

APA


Hansen, N.A., Su, H. & Wang, X.. (2022). Temporal Difference Learning for Model Predictive Control. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8387-8406 Available from https://proceedings.mlr.press/v162/hansen22a.html.

Related Material

Download PDF