Continuous time policy evaluation is easier with noisy dynamics

Samuel Robertson; Thomas Newton; Csaba Szepesvári

Continuous time policy evaluation is easier with noisy dynamics

Samuel Robertson, Thomas Newton, Csaba Szepesvári

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5598-5624, 2026.

Abstract

In this work, we study continuous-time stochastic control problems governed by controlled stochastic differential equations with unknown dynamics. We focus on the discounted infinite-horizon setting and restrict attention to feedback controllers. In general, the continuous time value function is the solution to the nonlinear Hamilton-Jacobi-Bellman (HJB) equation, which typical only admits viscosity solutions with no regularity. Our first contribution is to establish sharp regularity results for value functions using elliptic partial differential equation theory. Under mild growth and regularity assumptions on the controlled dynamics and a uniform ellipticity condition on the diffusion, we show that the value function belongs to a Matérn reproducing kernel Hilbert space (RKHS) that is strictly smoother than the running reward. Building on this analysis, we develop a kernel-based policy evaluation method that estimates value functions directly from online trajectory rollouts of a fixed policy. The resulting algorithm exploits the RKHS structure with a kernel ridge regression technique, reducing the infinite-dimensional learning problem to a finite-dimensional one. Our results establish a direct connection between stochastic control, elliptic regularity theory, and kernel methods, and provide a foundation for online policy evaluation and policy improvement in continuous time.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-robertson26a,
  title = 	 {Continuous time policy evaluation is easier with noisy dynamics},
  author =       {Robertson, Samuel and Newton, Thomas and Szepesv{\'a}ri, Csaba},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {5598--5624},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/robertson26a/robertson26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/robertson26a.html},
  abstract = 	 {In this work, we study continuous-time stochastic control problems governed by controlled stochastic differential equations with unknown dynamics. We focus on the discounted infinite-horizon setting and restrict attention to feedback controllers. In general, the continuous time value function is the solution to the nonlinear Hamilton-Jacobi-Bellman (HJB) equation, which typical only admits viscosity solutions with no regularity. Our first contribution is to establish sharp regularity results for value functions using elliptic partial differential equation theory. Under mild growth and regularity assumptions on the controlled dynamics and a uniform ellipticity condition on the diffusion, we show that the value function belongs to a Matérn reproducing kernel Hilbert space (RKHS) that is strictly smoother than the running reward. Building on this analysis, we develop a kernel-based policy evaluation method that estimates value functions directly from online trajectory rollouts of a fixed policy. The resulting algorithm exploits the RKHS structure with a kernel ridge regression technique, reducing the infinite-dimensional learning problem to a finite-dimensional one. Our results establish a direct connection between stochastic control, elliptic regularity theory, and kernel methods, and provide a foundation for online policy evaluation and policy improvement in continuous time.}
}

Endnote

%0 Conference Paper
%T Continuous time policy evaluation is easier with noisy dynamics
%A Samuel Robertson
%A Thomas Newton
%A Csaba Szepesvári
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-robertson26a
%I PMLR
%P 5598--5624
%U https://proceedings.mlr.press/v336/robertson26a.html
%V 336
%X In this work, we study continuous-time stochastic control problems governed by controlled stochastic differential equations with unknown dynamics. We focus on the discounted infinite-horizon setting and restrict attention to feedback controllers. In general, the continuous time value function is the solution to the nonlinear Hamilton-Jacobi-Bellman (HJB) equation, which typical only admits viscosity solutions with no regularity. Our first contribution is to establish sharp regularity results for value functions using elliptic partial differential equation theory. Under mild growth and regularity assumptions on the controlled dynamics and a uniform ellipticity condition on the diffusion, we show that the value function belongs to a Matérn reproducing kernel Hilbert space (RKHS) that is strictly smoother than the running reward. Building on this analysis, we develop a kernel-based policy evaluation method that estimates value functions directly from online trajectory rollouts of a fixed policy. The resulting algorithm exploits the RKHS structure with a kernel ridge regression technique, reducing the infinite-dimensional learning problem to a finite-dimensional one. Our results establish a direct connection between stochastic control, elliptic regularity theory, and kernel methods, and provide a foundation for online policy evaluation and policy improvement in continuous time.

APA

Robertson, S., Newton, T. & Szepesvári, C.. (2026). Continuous time policy evaluation is easier with noisy dynamics. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5598-5624 Available from https://proceedings.mlr.press/v336/robertson26a.html.

Related Material

Download PDF