Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Yue Wang, Shaofeng Zou
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:11-20, 2020.

Abstract

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-wang20a, title = {Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise}, author = {Wang, Yue and Zou, Shaofeng}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {11--20}, year = {2020}, editor = {Jonas Peters and David Sontag}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/wang20a/wang20a.pdf}, url = { http://proceedings.mlr.press/v124/wang20a.html }, abstract = {Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.} }
Endnote
%0 Conference Paper %T Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise %A Yue Wang %A Shaofeng Zou %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-wang20a %I PMLR %P 11--20 %U http://proceedings.mlr.press/v124/wang20a.html %V 124 %X Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.
APA
Wang, Y. & Zou, S.. (2020). Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:11-20 Available from http://proceedings.mlr.press/v124/wang20a.html .

Related Material