PID Accelerated Value Iteration Algorithm

Amir-Massoud Farahmand, Mohammad Ghavamzadeh
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3143-3153, 2021.

Abstract

The convergence rate of Value Iteration (VI), a fundamental procedure in dynamic programming and reinforcement learning, for solving MDPs can be slow when the discount factor is close to one. We propose modifications to VI in order to potentially accelerate its convergence behaviour. The key insight is the realization that the evolution of the value function approximations $(V_k)_{k \geq 0}$ in the VI procedure can be seen as a dynamical system. This opens up the possibility of using techniques from \emph{control theory} to modify, and potentially accelerate, this dynamics. We present such modifications based on simple controllers, such as PD (Proportional-Derivative), PI (Proportional-Integral), and PID. We present the error dynamics of these variants of VI, and provably (for certain classes of MDPs) and empirically (for more general classes) show that the convergence rate can be significantly improved. We also propose a gain adaptation mechanism in order to automatically select the controller gains, and empirically show the effectiveness of this procedure.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-farahmand21a, title = {PID Accelerated Value Iteration Algorithm}, author = {Farahmand, Amir-Massoud and Ghavamzadeh, Mohammad}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {3143--3153}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/farahmand21a/farahmand21a.pdf}, url = {https://proceedings.mlr.press/v139/farahmand21a.html}, abstract = {The convergence rate of Value Iteration (VI), a fundamental procedure in dynamic programming and reinforcement learning, for solving MDPs can be slow when the discount factor is close to one. We propose modifications to VI in order to potentially accelerate its convergence behaviour. The key insight is the realization that the evolution of the value function approximations $(V_k)_{k \geq 0}$ in the VI procedure can be seen as a dynamical system. This opens up the possibility of using techniques from \emph{control theory} to modify, and potentially accelerate, this dynamics. We present such modifications based on simple controllers, such as PD (Proportional-Derivative), PI (Proportional-Integral), and PID. We present the error dynamics of these variants of VI, and provably (for certain classes of MDPs) and empirically (for more general classes) show that the convergence rate can be significantly improved. We also propose a gain adaptation mechanism in order to automatically select the controller gains, and empirically show the effectiveness of this procedure.} }
Endnote
%0 Conference Paper %T PID Accelerated Value Iteration Algorithm %A Amir-Massoud Farahmand %A Mohammad Ghavamzadeh %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-farahmand21a %I PMLR %P 3143--3153 %U https://proceedings.mlr.press/v139/farahmand21a.html %V 139 %X The convergence rate of Value Iteration (VI), a fundamental procedure in dynamic programming and reinforcement learning, for solving MDPs can be slow when the discount factor is close to one. We propose modifications to VI in order to potentially accelerate its convergence behaviour. The key insight is the realization that the evolution of the value function approximations $(V_k)_{k \geq 0}$ in the VI procedure can be seen as a dynamical system. This opens up the possibility of using techniques from \emph{control theory} to modify, and potentially accelerate, this dynamics. We present such modifications based on simple controllers, such as PD (Proportional-Derivative), PI (Proportional-Integral), and PID. We present the error dynamics of these variants of VI, and provably (for certain classes of MDPs) and empirically (for more general classes) show that the convergence rate can be significantly improved. We also propose a gain adaptation mechanism in order to automatically select the controller gains, and empirically show the effectiveness of this procedure.
APA
Farahmand, A. & Ghavamzadeh, M.. (2021). PID Accelerated Value Iteration Algorithm. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3143-3153 Available from https://proceedings.mlr.press/v139/farahmand21a.html.

Related Material