On the Model-Misspecification in Reinforcement Learning

Yunfan Li, Lin Yang
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2764-2772, 2024.

Abstract

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-li24m, title = { On the Model-Misspecification in Reinforcement Learning }, author = {Li, Yunfan and Yang, Lin}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2764--2772}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/li24m/li24m.pdf}, url = {https://proceedings.mlr.press/v238/li24m.html}, abstract = { The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability. } }
Endnote
%0 Conference Paper %T On the Model-Misspecification in Reinforcement Learning %A Yunfan Li %A Lin Yang %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-li24m %I PMLR %P 2764--2772 %U https://proceedings.mlr.press/v238/li24m.html %V 238 %X The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability.
APA
Li, Y. & Yang, L.. (2024). On the Model-Misspecification in Reinforcement Learning . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2764-2772 Available from https://proceedings.mlr.press/v238/li24m.html.

Related Material