Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues; Pierre Menard; Matteo Pirotta; Emilie Kaufmann; Michal Valko

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues, Pierre Menard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2783-2792, 2021.

Abstract

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and applies to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-domingues21a,
  title = 	 {Kernel-Based Reinforcement Learning: A Finite-Time Analysis},
  author =       {Domingues, Omar Darwiche and Menard, Pierre and Pirotta, Matteo and Kaufmann, Emilie and Valko, Michal},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {2783--2792},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/domingues21a/domingues21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/domingues21a.html},
  abstract = 	 {We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and applies to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.}
}

Endnote

%0 Conference Paper
%T Kernel-Based Reinforcement Learning: A Finite-Time Analysis
%A Omar Darwiche Domingues
%A Pierre Menard
%A Matteo Pirotta
%A Emilie Kaufmann
%A Michal Valko
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-domingues21a
%I PMLR
%P 2783--2792
%U https://proceedings.mlr.press/v139/domingues21a.html
%V 139
%X We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and applies to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

APA

Domingues, O.D., Menard, P., Pirotta, M., Kaufmann, E. & Valko, M.. (2021). Kernel-Based Reinforcement Learning: A Finite-Time Analysis. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2783-2792 Available from https://proceedings.mlr.press/v139/domingues21a.html.

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Abstract

Cite this Paper

Related Material