A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Shengbo Wang; Nian Si; Jose Blanchet; Zhengyuan Zhou

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3370-3398, 2023.

Abstract

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in [Liu et. al. 2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an

$\epsilon$ error in the sup norm is upper bounded by

$\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$ , where

$\gamma$ is the discount rate,

$p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and

$\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-wang23b,
  title = 	 {A Finite Sample Complexity Bound for Distributionally Robust Q-learning},
  author =       {Wang, Shengbo and Si, Nian and Blanchet, Jose and Zhou, Zhengyuan},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3370--3398},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/wang23b/wang23b.pdf},
  url = 	 {https://proceedings.mlr.press/v206/wang23b.html},
  abstract = 	 {We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in [Liu et. al. 2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.}
}

Endnote

%0 Conference Paper
%T A Finite Sample Complexity Bound for Distributionally Robust Q-learning
%A Shengbo Wang
%A Nian Si
%A Jose Blanchet
%A Zhengyuan Zhou
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-wang23b
%I PMLR
%P 3370--3398
%U https://proceedings.mlr.press/v206/wang23b.html
%V 206
%X We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in [Liu et. al. 2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

APA


Wang, S., Si, N., Blanchet, J. & Zhou, Z.. (2023). A Finite Sample Complexity Bound for Distributionally Robust Q-learning. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3370-3398 Available from https://proceedings.mlr.press/v206/wang23b.html.

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Abstract

Cite this Paper

Related Material