Sample Complexity of Kernel-Based Q-Learning

Sing-Yuan Yeh; Fu-Chieh Chang; Chang-Wei Yueh; Pei-Yuan Wu; Alberto Bernacchia; Sattar Vakili

Sample Complexity of Kernel-Based Q-Learning

Sing-Yuan Yeh, Fu-Chieh Chang, Chang-Wei Yueh, Pei-Yuan Wu, Alberto Bernacchia, Sattar Vakili

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:453-469, 2023.

Abstract

Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a non-parametric Q-learning algorithm which finds an $\varepsilon$-optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\varepsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-yeh23a,
  title = 	 {Sample Complexity of Kernel-Based Q-Learning},
  author =       {Yeh, Sing-Yuan and Chang, Fu-Chieh and Yueh, Chang-Wei and Wu, Pei-Yuan and Bernacchia, Alberto and Vakili, Sattar},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {453--469},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/yeh23a/yeh23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/yeh23a.html},
  abstract = 	 {Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a non-parametric Q-learning algorithm which finds an $\varepsilon$-optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\varepsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.}
}

Endnote

%0 Conference Paper
%T Sample Complexity of Kernel-Based Q-Learning
%A Sing-Yuan Yeh
%A Fu-Chieh Chang
%A Chang-Wei Yueh
%A Pei-Yuan Wu
%A Alberto Bernacchia
%A Sattar Vakili
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-yeh23a
%I PMLR
%P 453--469
%U https://proceedings.mlr.press/v206/yeh23a.html
%V 206
%X Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a non-parametric Q-learning algorithm which finds an $\varepsilon$-optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\varepsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.

APA

Yeh, S., Chang, F., Yueh, C., Wu, P., Bernacchia, A. & Vakili, S.. (2023). Sample Complexity of Kernel-Based Q-Learning. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:453-469 Available from https://proceedings.mlr.press/v206/yeh23a.html.

Related Material

Download PDF