Provably Safe PAC-MDP Exploration Using Analogies

Melrose Roderick, Vaishnavh Nagarajan, Zico Kolter
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1216-1224, 2021.

Abstract

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-roderick21a, title = { Provably Safe PAC-MDP Exploration Using Analogies }, author = {Roderick, Melrose and Nagarajan, Vaishnavh and Kolter, Zico}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1216--1224}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/roderick21a/roderick21a.pdf}, url = {https://proceedings.mlr.press/v130/roderick21a.html}, abstract = { A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods. } }
Endnote
%0 Conference Paper %T Provably Safe PAC-MDP Exploration Using Analogies %A Melrose Roderick %A Vaishnavh Nagarajan %A Zico Kolter %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-roderick21a %I PMLR %P 1216--1224 %U https://proceedings.mlr.press/v130/roderick21a.html %V 130 %X A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.
APA
Roderick, M., Nagarajan, V. & Kolter, Z.. (2021). Provably Safe PAC-MDP Exploration Using Analogies . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1216-1224 Available from https://proceedings.mlr.press/v130/roderick21a.html.

Related Material