[edit]
Value Function Approximations via Kernel Embeddings for No-Regret Reinforcement Learning
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:249-264, 2023.
Abstract
We consider the regret minimization problem in
reinforcement learning (RL) in the episodic setting.
In many real-world RL environments, the state and
action spaces are continuous or very large. Existing
approaches establish regret guarantees by either a
low-dimensional representation of the stochastic
transition model or an approximation of the
$Q$-functions. However, the understanding of
function approximation schemes for state-value
functions largely remains missing. In this paper, we
propose an online model-based RL algorithm, namely
the CME-RL, that learns embeddings of the
state-transition distribution in a reproducing
kernel Hilbert space while carefully balancing the
exploitation-exploration tradeoff. We demonstrate
the efficiency of our algorithm by proving a
frequentist (worst-case) regret bound that is of
order
$\tilde{O}\big(H\gamma_N\sqrt{N}\big)$\footnote{
$\tilde{O}(\cdot)$ hides only absolute constant and
poly-logarithmic factors.}, where $H$ is the episode
length, $N$ is the total number of time steps and
$\gamma_N$ is an information theoretic quantity
relating the effective dimension of the state-action
feature space. Our method bypasses the need for
estimating transition probabilities and applies to
any domain on which kernels can be defined. It also
brings new insights into the general theory of
kernel methods for approximate inference and RL
regret minimization.