[edit]
A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:51273-51302, 2024.
Abstract
In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parametric and non-parametric models on the Q-function. Specifically, we address three key questions related to FQE that remain largely unexplored in the current literature: (1) Is the optimal convergence rate for estimating the policy value regarding the sample size $n$ ($n^{−1/2}$) achievable for FQE under a nonparametric model with a fixed horizon ($T$ )? (2) How does the error bound depend on the horizon T ? (3) What is the role of the probability ratio function in improving the convergence of FQE estimators? Specifically, we show that under the completeness assumption of Q-functions, which is mild in the non-parametric setting, the estimation errors for policy value using both parametric and non-parametric FQE estimators can achieve an optimal rate in terms of n. The corresponding error bounds in terms of both $n$ and $T$ are also established. With an additional realizability assumption on ratio functions, the rate of estimation errors can be improved from $T^{ 1.5}/\sqrt{n}$ to $T /\sqrt{n}$, which matches the sharpest known bound in the current literature under the tabular setting.