On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu; Yunchang Yang; Simon Du; Liwei Wang

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu, Yunchang Yang, Simon Du, Liwei Wang

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11296-11306, 2021.

Abstract

We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of

$\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$ , where

$S$ is the number of states,

$A$ is the number of actions,

$H$ is the planning horizon,

$K$ is the number of episodes, and

$C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a

$\sqrt{K}$ -type regret in this setting and is computationally efficient.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-wu21g,
  title = 	 {On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP},
  author =       {Wu, Tianhao and Yang, Yunchang and Du, Simon and Wang, Liwei},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11296--11306},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/wu21g/wu21g.pdf},
  url = 	 {https://proceedings.mlr.press/v139/wu21g.html},
  abstract = 	 {We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.}
}

Endnote

%0 Conference Paper
%T On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP
%A Tianhao Wu
%A Yunchang Yang
%A Simon Du
%A Liwei Wang
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-wu21g
%I PMLR
%P 11296--11306
%U https://proceedings.mlr.press/v139/wu21g.html
%V 139
%X We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.

APA


Wu, T., Yang, Y., Du, S. & Wang, L.. (2021). On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11296-11306 Available from https://proceedings.mlr.press/v139/wu21g.html.

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Abstract

Cite this Paper

Related Material