The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

Stephen Tu, Benjamin Recht
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:3036-3083, 2019.

Abstract

The effectiveness of model-based versus model-free methods is a long-standing question in reinforcement learning (RL). Motivated by recent empirical success of RL on continuous control tasks, we study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). We show that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. For policy optimization, we study a simple family of problem instances and show that nominal (certainty equivalence principle) control also requires several factors of state and input dimension fewer samples than the policy gradient method to reach the same level of control performance on these instances. Furthermore, the gap persists even when employing baselines commonly used in practice. To the best of our knowledge, this is the first theoretical result which demonstrates a separation in the sample complexity between model-based and model-free methods on a continuous control task.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-tu19a, title = {The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint}, author = {Tu, Stephen and Recht, Benjamin}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {3036--3083}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/tu19a/tu19a.pdf}, url = {https://proceedings.mlr.press/v99/tu19a.html}, abstract = {The effectiveness of model-based versus model-free methods is a long-standing question in reinforcement learning (RL). Motivated by recent empirical success of RL on continuous control tasks, we study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). We show that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. For policy optimization, we study a simple family of problem instances and show that nominal (certainty equivalence principle) control also requires several factors of state and input dimension fewer samples than the policy gradient method to reach the same level of control performance on these instances. Furthermore, the gap persists even when employing baselines commonly used in practice. To the best of our knowledge, this is the first theoretical result which demonstrates a separation in the sample complexity between model-based and model-free methods on a continuous control task.} }
Endnote
%0 Conference Paper %T The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint %A Stephen Tu %A Benjamin Recht %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-tu19a %I PMLR %P 3036--3083 %U https://proceedings.mlr.press/v99/tu19a.html %V 99 %X The effectiveness of model-based versus model-free methods is a long-standing question in reinforcement learning (RL). Motivated by recent empirical success of RL on continuous control tasks, we study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). We show that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. For policy optimization, we study a simple family of problem instances and show that nominal (certainty equivalence principle) control also requires several factors of state and input dimension fewer samples than the policy gradient method to reach the same level of control performance on these instances. Furthermore, the gap persists even when employing baselines commonly used in practice. To the best of our knowledge, this is the first theoretical result which demonstrates a separation in the sample complexity between model-based and model-free methods on a continuous control task.
APA
Tu, S. & Recht, B.. (2019). The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:3036-3083 Available from https://proceedings.mlr.press/v99/tu19a.html.

Related Material