Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Yasin Abbasi-Yadkori; Nevena Lazic; Csaba Szepesvari

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3108-3117, 2019.

Abstract

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as

$O(T^{\xi+2/3})$ for any small

$\xi>0$ if time horizon satisfies

$T>C^{1/\xi}$ for a constant

$C$ . The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions. This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-abbasi-yadkori19a,
  title = 	 {Model-Free Linear Quadratic Control via Reduction to Expert Prediction},
  author =       {Abbasi-Yadkori, Yasin and Lazic, Nevena and Szepesvari, Csaba},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3108--3117},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/abbasi-yadkori19a/abbasi-yadkori19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/abbasi-yadkori19a.html},
  abstract = 	 {Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL.  In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$. The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions.  This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.}
}

Endnote

%0 Conference Paper
%T Model-Free Linear Quadratic Control via Reduction to Expert Prediction
%A Yasin Abbasi-Yadkori
%A Nevena Lazic
%A Csaba Szepesvari
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-abbasi-yadkori19a
%I PMLR
%P 3108--3117
%U https://proceedings.mlr.press/v89/abbasi-yadkori19a.html
%V 89
%X Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL.  In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$. The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions.  This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.

APA


Abbasi-Yadkori, Y., Lazic, N. & Szepesvari, C.. (2019). Model-Free Linear Quadratic Control via Reduction to Expert Prediction. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:3108-3117 Available from https://proceedings.mlr.press/v89/abbasi-yadkori19a.html.

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Abstract

Cite this Paper

Related Material