Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Yue Wang; Shaofeng Zou

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Yue Wang, Shaofeng Zou

Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:11-20, 2020.

Abstract

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

Cite this Paper

BibTeX


@InProceedings{pmlr-v124-wang20a,
  title = 	 {Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise},
  author =       {Wang, Yue and Zou, Shaofeng},
  booktitle = 	 {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)},
  pages = 	 {11--20},
  year = 	 {2020},
  editor = 	 {Peters, Jonas and Sontag, David},
  volume = 	 {124},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--06 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v124/wang20a/wang20a.pdf},
  url = 	 {https://proceedings.mlr.press/v124/wang20a.html},
  abstract = 	 {Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm.  Our techniques in this paper provide a general framework for finite-sample analysis of  non-convex value-based reinforcement learning algorithms for optimal control.}
}

Endnote

%0 Conference Paper
%T Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise
%A Yue Wang
%A Shaofeng Zou
%B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)
%C Proceedings of Machine Learning Research
%D 2020
%E Jonas Peters
%E David Sontag	
%F pmlr-v124-wang20a
%I PMLR
%P 11--20
%U https://proceedings.mlr.press/v124/wang20a.html
%V 124
%X Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm.  Our techniques in this paper provide a general framework for finite-sample analysis of  non-convex value-based reinforcement learning algorithms for optimal control.

APA


Wang, Y. & Zou, S.. (2020). Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:11-20 Available from https://proceedings.mlr.press/v124/wang20a.html.

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Abstract

Cite this Paper

Related Material