A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Pan Xu; Quanquan Gu

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Pan Xu, Quanquan Gu

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10555-10565, 2020.

Abstract

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an

$O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where

$T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-xu20c,
  title = 	 {A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation},
  author =       {Xu, Pan and Gu, Quanquan},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {10555--10565},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/xu20c/xu20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/xu20c.html},
  abstract = 	 {Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.}
}

Endnote

%0 Conference Paper
%T A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
%A Pan Xu
%A Quanquan Gu
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-xu20c
%I PMLR
%P 10555--10565
%U https://proceedings.mlr.press/v119/xu20c.html
%V 119
%X Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

APA


Xu, P. & Gu, Q.. (2020). A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10555-10565 Available from https://proceedings.mlr.press/v119/xu20c.html.

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Abstract

Cite this Paper

Related Material