When to stop value iteration: stability and near-optimality versus computation

Mathieu Granzotto; Romain Postoyan; Dragan Nešić; Lucian Buşoniu; Jamal Daafouz

When to stop value iteration: stability and near-optimality versus computation

Mathieu Granzotto, Romain Postoyan, Dragan Nešić, Lucian Buşoniu, Jamal Daafouz

Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:412-424, 2021.

Abstract

Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is when to terminate the algorithm to produce a “good” solution, with a measurable impact on optimality and stability guarantees. By carefully analysing VI under general stabilizability and detectability properties, we provide explicit and novel relationships of the stopping criterion’s impact on near-optimality, stability and performance, thus allowing to tune these desirable properties against the induced computational cost. The considered class of stopping criteria encompasses those encountered in the control, dynamic programming and reinforcement learning literature and it allows considering new ones, which may be useful to further reduce the computational cost while endowing and satisfying stability and near-optimality properties. We therefore lay a foundation to endow machine learning schemes based on VI with stability and performance guarantees, while reducing computational complexity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v144-granzotto21a,
  title = 	 {When to stop value iteration: stability and near-optimality versus computation},
  author =       {Granzotto, Mathieu and Postoyan, Romain and Ne\v{s}i\'{c}, Dragan and Bu\c{s}oniu, Lucian and Daafouz, Jamal},
  booktitle = 	 {Proceedings of the 3rd Conference on Learning for Dynamics and Control},
  pages = 	 {412--424},
  year = 	 {2021},
  editor = 	 {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.},
  volume = 	 {144},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07 -- 08 June},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v144/granzotto21a/granzotto21a.pdf},
  url = 	 {https://proceedings.mlr.press/v144/granzotto21a.html},
  abstract = 	 { Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is when to terminate the algorithm to produce a “good” solution, with a measurable impact on optimality and stability guarantees. By carefully analysing VI under general stabilizability and detectability properties, we provide explicit and novel relationships of the stopping criterion’s impact on near-optimality, stability and performance, thus allowing to tune these desirable properties against the induced computational cost. The considered class of stopping criteria encompasses those encountered in the control, dynamic programming and reinforcement learning literature and it allows considering new ones, which may be useful to further reduce the computational cost while endowing and satisfying stability and near-optimality properties. We therefore lay a foundation to endow machine learning schemes based on VI with stability and performance guarantees, while reducing computational complexity.}
}

Endnote

%0 Conference Paper
%T When to stop value iteration: stability and near-optimality versus computation
%A Mathieu Granzotto
%A Romain Postoyan
%A Dragan Nešić
%A Lucian Buşoniu
%A Jamal Daafouz
%B Proceedings of the 3rd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2021
%E Ali Jadbabaie
%E John Lygeros
%E George J. Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire J. Tomlin
%E Melanie N. Zeilinger	
%F pmlr-v144-granzotto21a
%I PMLR
%P 412--424
%U https://proceedings.mlr.press/v144/granzotto21a.html
%V 144
%X  Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is when to terminate the algorithm to produce a “good” solution, with a measurable impact on optimality and stability guarantees. By carefully analysing VI under general stabilizability and detectability properties, we provide explicit and novel relationships of the stopping criterion’s impact on near-optimality, stability and performance, thus allowing to tune these desirable properties against the induced computational cost. The considered class of stopping criteria encompasses those encountered in the control, dynamic programming and reinforcement learning literature and it allows considering new ones, which may be useful to further reduce the computational cost while endowing and satisfying stability and near-optimality properties. We therefore lay a foundation to endow machine learning schemes based on VI with stability and performance guarantees, while reducing computational complexity.

APA

Granzotto, M., Postoyan, R., Nešić, D., Buşoniu, L. & Daafouz, J.. (2021). When to stop value iteration: stability and near-optimality versus computation. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:412-424 Available from https://proceedings.mlr.press/v144/granzotto21a.html.

Related Material

Download PDF