[edit]
Continuous time policy evaluation is easier with noisy dynamics
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5598-5624, 2026.
Abstract
In this work, we study continuous-time stochastic control problems governed by controlled stochastic differential equations with unknown dynamics. We focus on the discounted infinite-horizon setting and restrict attention to feedback controllers. In general, the continuous time value function is the solution to the nonlinear Hamilton-Jacobi-Bellman (HJB) equation, which typical only admits viscosity solutions with no regularity. Our first contribution is to establish sharp regularity results for value functions using elliptic partial differential equation theory. Under mild growth and regularity assumptions on the controlled dynamics and a uniform ellipticity condition on the diffusion, we show that the value function belongs to a Matérn reproducing kernel Hilbert space (RKHS) that is strictly smoother than the running reward. Building on this analysis, we develop a kernel-based policy evaluation method that estimates value functions directly from online trajectory rollouts of a fixed policy. The resulting algorithm exploits the RKHS structure with a kernel ridge regression technique, reducing the infinite-dimensional learning problem to a finite-dimensional one. Our results establish a direct connection between stochastic control, elliptic regularity theory, and kernel methods, and provide a foundation for online policy evaluation and policy improvement in continuous time.