A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Pan Xu, Quanquan Gu
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10555-10565, 2020.

Abstract

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-xu20c, title = {A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation}, author = {Xu, Pan and Gu, Quanquan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10555--10565}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/xu20c/xu20c.pdf}, url = {https://proceedings.mlr.press/v119/xu20c.html}, abstract = {Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.} }
Endnote
%0 Conference Paper %T A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation %A Pan Xu %A Quanquan Gu %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-xu20c %I PMLR %P 10555--10565 %U https://proceedings.mlr.press/v119/xu20c.html %V 119 %X Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.
APA
Xu, P. & Gu, Q.. (2020). A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10555-10565 Available from https://proceedings.mlr.press/v119/xu20c.html.

Related Material