A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models

Jiayi Wang, Zhengling Qi, Raymond K. W. Wong
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:51273-51302, 2024.

Abstract

In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parametric and non-parametric models on the Q-function. Specifically, we address three key questions related to FQE that remain largely unexplored in the current literature: (1) Is the optimal convergence rate for estimating the policy value regarding the sample size $n$ ($n^{−1/2}$) achievable for FQE under a nonparametric model with a fixed horizon ($T$ )? (2) How does the error bound depend on the horizon T ? (3) What is the role of the probability ratio function in improving the convergence of FQE estimators? Specifically, we show that under the completeness assumption of Q-functions, which is mild in the non-parametric setting, the estimation errors for policy value using both parametric and non-parametric FQE estimators can achieve an optimal rate in terms of n. The corresponding error bounds in terms of both $n$ and $T$ are also established. With an additional realizability assumption on ratio functions, the rate of estimation errors can be improved from $T^{ 1.5}/\sqrt{n}$ to $T /\sqrt{n}$, which matches the sharpest known bound in the current literature under the tabular setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24be, title = {A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models}, author = {Wang, Jiayi and Qi, Zhengling and Wong, Raymond K. W.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {51273--51302}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24be/wang24be.pdf}, url = {https://proceedings.mlr.press/v235/wang24be.html}, abstract = {In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parametric and non-parametric models on the Q-function. Specifically, we address three key questions related to FQE that remain largely unexplored in the current literature: (1) Is the optimal convergence rate for estimating the policy value regarding the sample size $n$ ($n^{−1/2}$) achievable for FQE under a nonparametric model with a fixed horizon ($T$ )? (2) How does the error bound depend on the horizon T ? (3) What is the role of the probability ratio function in improving the convergence of FQE estimators? Specifically, we show that under the completeness assumption of Q-functions, which is mild in the non-parametric setting, the estimation errors for policy value using both parametric and non-parametric FQE estimators can achieve an optimal rate in terms of n. The corresponding error bounds in terms of both $n$ and $T$ are also established. With an additional realizability assumption on ratio functions, the rate of estimation errors can be improved from $T^{ 1.5}/\sqrt{n}$ to $T /\sqrt{n}$, which matches the sharpest known bound in the current literature under the tabular setting.} }
Endnote
%0 Conference Paper %T A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models %A Jiayi Wang %A Zhengling Qi %A Raymond K. W. Wong %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24be %I PMLR %P 51273--51302 %U https://proceedings.mlr.press/v235/wang24be.html %V 235 %X In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parametric and non-parametric models on the Q-function. Specifically, we address three key questions related to FQE that remain largely unexplored in the current literature: (1) Is the optimal convergence rate for estimating the policy value regarding the sample size $n$ ($n^{−1/2}$) achievable for FQE under a nonparametric model with a fixed horizon ($T$ )? (2) How does the error bound depend on the horizon T ? (3) What is the role of the probability ratio function in improving the convergence of FQE estimators? Specifically, we show that under the completeness assumption of Q-functions, which is mild in the non-parametric setting, the estimation errors for policy value using both parametric and non-parametric FQE estimators can achieve an optimal rate in terms of n. The corresponding error bounds in terms of both $n$ and $T$ are also established. With an additional realizability assumption on ratio functions, the rate of estimation errors can be improved from $T^{ 1.5}/\sqrt{n}$ to $T /\sqrt{n}$, which matches the sharpest known bound in the current literature under the tabular setting.
APA
Wang, J., Qi, Z. & Wong, R.K.W.. (2024). A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:51273-51302 Available from https://proceedings.mlr.press/v235/wang24be.html.

Related Material