High Confidence Generalization for Reinforcement Learning

James Kostas, Yash Chandak, Scott M Jordan, Georgios Theocharous, Philip Thomas
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5764-5773, 2021.

Abstract

We present several classes of reinforcement learning algorithms that safely generalize to Markov decision processes (MDPs) not seen during training. Specifically, we study the setting in which some set of MDPs is accessible for training. The goal is to generalize safely to MDPs that are sampled from the same distribution, but which may not be in the set accessible for training. For various definitions of safety, our algorithms give probabilistic guarantees that agents can safely generalize to MDPs that are sampled from the same distribution but are not necessarily in the training set. These algorithms are a type of Seldonian algorithm (Thomas et al., 2019), which is a class of machine learning algorithms that return models with probabilistic safety guarantees for user-specified definitions of safety.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-kostas21a, title = {High Confidence Generalization for Reinforcement Learning}, author = {Kostas, James and Chandak, Yash and Jordan, Scott M and Theocharous, Georgios and Thomas, Philip}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5764--5773}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/kostas21a/kostas21a.pdf}, url = {https://proceedings.mlr.press/v139/kostas21a.html}, abstract = {We present several classes of reinforcement learning algorithms that safely generalize to Markov decision processes (MDPs) not seen during training. Specifically, we study the setting in which some set of MDPs is accessible for training. The goal is to generalize safely to MDPs that are sampled from the same distribution, but which may not be in the set accessible for training. For various definitions of safety, our algorithms give probabilistic guarantees that agents can safely generalize to MDPs that are sampled from the same distribution but are not necessarily in the training set. These algorithms are a type of Seldonian algorithm (Thomas et al., 2019), which is a class of machine learning algorithms that return models with probabilistic safety guarantees for user-specified definitions of safety.} }
Endnote
%0 Conference Paper %T High Confidence Generalization for Reinforcement Learning %A James Kostas %A Yash Chandak %A Scott M Jordan %A Georgios Theocharous %A Philip Thomas %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-kostas21a %I PMLR %P 5764--5773 %U https://proceedings.mlr.press/v139/kostas21a.html %V 139 %X We present several classes of reinforcement learning algorithms that safely generalize to Markov decision processes (MDPs) not seen during training. Specifically, we study the setting in which some set of MDPs is accessible for training. The goal is to generalize safely to MDPs that are sampled from the same distribution, but which may not be in the set accessible for training. For various definitions of safety, our algorithms give probabilistic guarantees that agents can safely generalize to MDPs that are sampled from the same distribution but are not necessarily in the training set. These algorithms are a type of Seldonian algorithm (Thomas et al., 2019), which is a class of machine learning algorithms that return models with probabilistic safety guarantees for user-specified definitions of safety.
APA
Kostas, J., Chandak, Y., Jordan, S.M., Theocharous, G. & Thomas, P.. (2021). High Confidence Generalization for Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5764-5773 Available from https://proceedings.mlr.press/v139/kostas21a.html.

Related Material