On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Mudit Gaur; Vaneet Aggarwal; Mridul Agarwal

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Mudit Gaur, Vaneet Aggarwal, Mridul Agarwal

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11013-11049, 2023.

Abstract

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-gaur23a,
  title = 	 {On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization},
  author =       {Gaur, Mudit and Aggarwal, Vaneet and Agarwal, Mridul},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {11013--11049},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/gaur23a/gaur23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/gaur23a.html},
  abstract = 	 {Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.}
}

Endnote

%0 Conference Paper
%T On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
%A Mudit Gaur
%A Vaneet Aggarwal
%A Mridul Agarwal
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-gaur23a
%I PMLR
%P 11013--11049
%U https://proceedings.mlr.press/v202/gaur23a.html
%V 202
%X Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

APA

Gaur, M., Aggarwal, V. & Agarwal, M.. (2023). On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:11013-11049 Available from https://proceedings.mlr.press/v202/gaur23a.html.

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Abstract

Cite this Paper

Related Material