On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu, Yunchang Yang, Simon Du, Liwei Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11296-11306, 2021.

Abstract

We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-wu21g, title = {On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP}, author = {Wu, Tianhao and Yang, Yunchang and Du, Simon and Wang, Liwei}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11296--11306}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/wu21g/wu21g.pdf}, url = {https://proceedings.mlr.press/v139/wu21g.html}, abstract = {We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.} }
Endnote
%0 Conference Paper %T On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP %A Tianhao Wu %A Yunchang Yang %A Simon Du %A Liwei Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-wu21g %I PMLR %P 11296--11306 %U https://proceedings.mlr.press/v139/wu21g.html %V 139 %X We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (\textsf{CR-MVP}), which achieves a regret bound of $\tilde{O}\left(\left(\sqrt{SAK}+S^2A+CSA)\right)\polylog(H)\right)$, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $K$ is the number of episodes, and $C$ is the corruption level. We also provide a corresponding lower bound, which indicates that our upper bound is tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a $\sqrt{K}$-type regret in this setting and is computationally efficient.
APA
Wu, T., Yang, Y., Du, S. & Wang, L.. (2021). On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11296-11306 Available from https://proceedings.mlr.press/v139/wu21g.html.

Related Material