Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Davide Maran; Alberto Maria Metelli; Matteo Papini; Marcello Restelli

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restelli

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:3743-3774, 2024.

Abstract

We consider the problem of learning an

$\varepsilon$ -optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our

$\widetilde{O}(\epsilon^{-2-d/(\nu+1)})$ sample complexity, where

$d$ is the dimension of the state-action space and

$\nu$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs

$(\nu=0)$ . At the same time, for

$\nu\to\infty$ , it recovers and greatly generalizes the

$O(\epsilon^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-maran24a,
  title = 	 {Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs},
  author =       {Maran, Davide and Metelli, Alberto Maria and Papini, Matteo and Restelli, Marcello},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {3743--3774},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/maran24a/maran24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/maran24a.html},
  abstract = 	 {We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis.  Our $\widetilde{O}(\epsilon^{-2-d/(\nu+1)})$ sample complexity, where $d$ is the dimension of the state-action space and $\nu$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs $(\nu=0)$. At the same time, for $\nu\to\infty$, it recovers and greatly generalizes the $O(\epsilon^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.  }
}

Endnote

%0 Conference Paper
%T Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs
%A Davide Maran
%A Alberto Maria Metelli
%A Matteo Papini
%A Marcello Restelli
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-maran24a
%I PMLR
%P 3743--3774
%U https://proceedings.mlr.press/v247/maran24a.html
%V 247
%X We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis.  Our $\widetilde{O}(\epsilon^{-2-d/(\nu+1)})$ sample complexity, where $d$ is the dimension of the state-action space and $\nu$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs $(\nu=0)$. At the same time, for $\nu\to\infty$, it recovers and greatly generalizes the $O(\epsilon^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

APA


Maran, D., Metelli, A.M., Papini, M. & Restelli, M.. (2024). Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:3743-3774 Available from https://proceedings.mlr.press/v247/maran24a.html.

Related Material

Download PDF