Action Robust Reinforcement Learning and Applications in Continuous Control

Chen Tessler, Yonathan Efroni, Shie Mannor
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6215-6224, 2019.

Abstract

A policy is said to be robust if it maximizes the reward while considering a bad, or even adversarial, model. In this work we formalize two new criteria of robustness to action uncertainty. Specifically, we consider two scenarios in which the agent attempts to perform an action $\action$, and (i) with probability $\alpha$, an alternative adversarial action $\bar \action$ is taken, or (ii) an adversary adds a perturbation to the selected action in the case of continuous action space. We show that our criteria are related to common forms of uncertainty in robotics domains, such as the occurrence of abrupt forces, and suggest algorithms in the tabular case. Building on the suggested algorithms, we generalize our approach to deep reinforcement learning (DRL) and provide extensive experiments in the various MuJoCo domains. Our experiments show that not only does our approach produce robust policies, but it also improves the performance in the absence of perturbations. This generalization indicates that action-robustness can be thought of as implicit regularization in RL problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-tessler19a, title = {Action Robust Reinforcement Learning and Applications in Continuous Control}, author = {Tessler, Chen and Efroni, Yonathan and Mannor, Shie}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6215--6224}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/tessler19a/tessler19a.pdf}, url = {https://proceedings.mlr.press/v97/tessler19a.html}, abstract = {A policy is said to be robust if it maximizes the reward while considering a bad, or even adversarial, model. In this work we formalize two new criteria of robustness to action uncertainty. Specifically, we consider two scenarios in which the agent attempts to perform an action $\action$, and (i) with probability $\alpha$, an alternative adversarial action $\bar \action$ is taken, or (ii) an adversary adds a perturbation to the selected action in the case of continuous action space. We show that our criteria are related to common forms of uncertainty in robotics domains, such as the occurrence of abrupt forces, and suggest algorithms in the tabular case. Building on the suggested algorithms, we generalize our approach to deep reinforcement learning (DRL) and provide extensive experiments in the various MuJoCo domains. Our experiments show that not only does our approach produce robust policies, but it also improves the performance in the absence of perturbations. This generalization indicates that action-robustness can be thought of as implicit regularization in RL problems.} }
Endnote
%0 Conference Paper %T Action Robust Reinforcement Learning and Applications in Continuous Control %A Chen Tessler %A Yonathan Efroni %A Shie Mannor %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-tessler19a %I PMLR %P 6215--6224 %U https://proceedings.mlr.press/v97/tessler19a.html %V 97 %X A policy is said to be robust if it maximizes the reward while considering a bad, or even adversarial, model. In this work we formalize two new criteria of robustness to action uncertainty. Specifically, we consider two scenarios in which the agent attempts to perform an action $\action$, and (i) with probability $\alpha$, an alternative adversarial action $\bar \action$ is taken, or (ii) an adversary adds a perturbation to the selected action in the case of continuous action space. We show that our criteria are related to common forms of uncertainty in robotics domains, such as the occurrence of abrupt forces, and suggest algorithms in the tabular case. Building on the suggested algorithms, we generalize our approach to deep reinforcement learning (DRL) and provide extensive experiments in the various MuJoCo domains. Our experiments show that not only does our approach produce robust policies, but it also improves the performance in the absence of perturbations. This generalization indicates that action-robustness can be thought of as implicit regularization in RL problems.
APA
Tessler, C., Efroni, Y. & Mannor, S.. (2019). Action Robust Reinforcement Learning and Applications in Continuous Control. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6215-6224 Available from https://proceedings.mlr.press/v97/tessler19a.html.

Related Material