On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up

Haoyu Han; Heng Yang

On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up

Haoyu Han, Heng Yang

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:654-666, 2024.

Abstract

We revisit the inverted pendulum problem with the goal of understanding and computing the true optimal value function. We start with an observation that the true optimal value function must be nonsmooth (i.e., not globally C1) due to symmetry of the problem. We then give a result that can certify the optimality of a candidate piece-wise C1 value function. Further, for a candidate value function obtained via numerical approximation, we provide a bound of suboptimality based on its Hamilton-Jacobi-Bellman (HJB) equation residuals. Inspired by Holzhüter (2004), we then design an algorithm that solves backwards the Pontryagin’s minimum principle (PMP) ODE from terminal conditions provided by the locally optimal LQR value function. This numerical procedure leads to a piece-wise C1 value function whose nonsmooth region contains periodic spiral lines and smooth regions attain HJB residuals about $10^{-4}$, hence certiﬁed to be the optimal value function up to minor numerical inaccuracies. This optimal value function checks the power of optimality: (i) it sits above a polynomial lower bound; (ii) its induced controller globally swings up and stabilizes the pendulum, and (iii) attains lower trajectory cost than baseline methods such as energy shaping, model predictive control (MPC), and proximal policy optimization (with MPC attaining almost the same cost). We conclude by distilling the optimal value function into a simple neural network.

Cite this Paper

BibTeX


@InProceedings{pmlr-v242-han24a,
  title = 	 {On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up},
  author =       {Han, Haoyu and Yang, Heng},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {654--666},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/han24a/han24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/han24a.html},
  abstract = 	 {We revisit the inverted pendulum problem with the goal of understanding and computing the true optimal value function. We start with an observation that the true optimal value function must be nonsmooth (i.e., not globally C1) due to symmetry of the problem. We then give a result that can certify the optimality of a candidate piece-wise C1 value function. Further, for a candidate value function obtained via numerical approximation, we provide a bound of suboptimality based on its Hamilton-Jacobi-Bellman (HJB) equation residuals. Inspired by Holzhüter (2004), we then design an algorithm that solves backwards the Pontryagin’s minimum principle (PMP) ODE from terminal conditions provided by the locally optimal LQR value function. This numerical procedure leads to a piece-wise C1 value function whose nonsmooth region contains periodic spiral lines and smooth regions attain HJB residuals about $10^{-4}$, hence certiﬁed to be the optimal value function up to minor numerical inaccuracies. This optimal value function checks the power of optimality: (i) it sits above a polynomial lower bound; (ii) its induced controller globally swings up and stabilizes the pendulum, and (iii) attains lower trajectory cost than baseline methods such as energy shaping, model predictive control (MPC), and proximal policy optimization (with MPC attaining almost the same cost). We conclude by distilling the optimal value function into a simple neural network.}
}

Endnote

%0 Conference Paper
%T On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up
%A Haoyu Han
%A Heng Yang
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-han24a
%I PMLR
%P 654--666
%U https://proceedings.mlr.press/v242/han24a.html
%V 242
%X We revisit the inverted pendulum problem with the goal of understanding and computing the true optimal value function. We start with an observation that the true optimal value function must be nonsmooth (i.e., not globally C1) due to symmetry of the problem. We then give a result that can certify the optimality of a candidate piece-wise C1 value function. Further, for a candidate value function obtained via numerical approximation, we provide a bound of suboptimality based on its Hamilton-Jacobi-Bellman (HJB) equation residuals. Inspired by Holzhüter (2004), we then design an algorithm that solves backwards the Pontryagin’s minimum principle (PMP) ODE from terminal conditions provided by the locally optimal LQR value function. This numerical procedure leads to a piece-wise C1 value function whose nonsmooth region contains periodic spiral lines and smooth regions attain HJB residuals about $10^{-4}$, hence certiﬁed to be the optimal value function up to minor numerical inaccuracies. This optimal value function checks the power of optimality: (i) it sits above a polynomial lower bound; (ii) its induced controller globally swings up and stabilizes the pendulum, and (iii) attains lower trajectory cost than baseline methods such as energy shaping, model predictive control (MPC), and proximal policy optimization (with MPC attaining almost the same cost). We conclude by distilling the optimal value function into a simple neural network.

APA


Han, H. & Yang, H.. (2024). On the nonsmooth geometry and neural approximation of the optimal value function of infinite-horizon pendulum swing-up. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:654-666 Available from https://proceedings.mlr.press/v242/han24a.html.

Related Material

Download PDF