[edit]
Model-Based Reinforcement Learning with Value-Targeted Regression
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:666-686, 2020.
Abstract
Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. In this paper we focus on finite-horizon episodic RL where the transition model admits the linear parametrization: P=∑di=1(θ)iPi. This parametrization provides a universal function approximation and capture several useful models and applications. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of θ by recursively solving a regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound ˜O(d√H3T), where H,T,d are the horizon, total number of steps and dimension of θ. This regret bound is independent of the total number of states or actions, and is close to a lower bound Ω(√HdT).