Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3015-3024, 2020.

Abstract

Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an ‘ergodic-like’ assumption on the underlying stochastic noise sequence. Leveraging a \emph{multistep Lyapunov function} that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of \emph{biased SA} algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is \emph{the first of its kind}, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-wang20h, title = {Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning}, author = {Wang, Gang and Giannakis, Georgios B.}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3015--3024}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/wang20h/wang20h.pdf}, url = {https://proceedings.mlr.press/v108/wang20h.html}, abstract = {Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an ‘ergodic-like’ assumption on the underlying stochastic noise sequence. Leveraging a \emph{multistep Lyapunov function} that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of \emph{biased SA} algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is \emph{the first of its kind}, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable. } }
Endnote
%0 Conference Paper %T Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning %A Gang Wang %A Georgios B. Giannakis %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-wang20h %I PMLR %P 3015--3024 %U https://proceedings.mlr.press/v108/wang20h.html %V 108 %X Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an ‘ergodic-like’ assumption on the underlying stochastic noise sequence. Leveraging a \emph{multistep Lyapunov function} that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of \emph{biased SA} algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is \emph{the first of its kind}, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.
APA
Wang, G. & Giannakis, G.B.. (2020). Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3015-3024 Available from https://proceedings.mlr.press/v108/wang20h.html.

Related Material