On the Model-Misspecification in Reinforcement Learning

Yunfan Li; Lin Yang

On the Model-Misspecification in Reinforcement Learning

Yunfan Li, Lin Yang

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2764-2772, 2024.

Abstract

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a

$\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of

$\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$ , where

$d$ represents the complexity of the function class,

$H$ is the episode length,

$K$ is the total number of episodes, and

$\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of

$\zeta$ , thereby enhancing its practical applicability.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-li24m,
  title = 	 {On the Model-Misspecification in Reinforcement Learning},
  author =       {Li, Yunfan and Yang, Lin},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2764--2772},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/li24m/li24m.pdf},
  url = 	 {https://proceedings.mlr.press/v238/li24m.html},
  abstract = 	 {The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability.}
}

Endnote

%0 Conference Paper
%T On the Model-Misspecification in Reinforcement Learning
%A Yunfan Li
%A Lin Yang
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-li24m
%I PMLR
%P 2764--2772
%U https://proceedings.mlr.press/v238/li24m.html
%V 238
%X The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators—it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability.

APA

Li, Y. & Yang, L.. (2024). On the Model-Misspecification in Reinforcement Learning. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2764-2772 Available from https://proceedings.mlr.press/v238/li24m.html.

Related Material

Download PDF