Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng; Lillian Ratliff

Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng, Lillian Ratliff

Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:620-629, 2020.

Abstract

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.

Cite this Paper

BibTeX


@InProceedings{pmlr-v120-zheng20a,
  title = 	 {Constrained Upper Confidence Reinforcement Learning},
  author =       {Zheng, Liyuan and Ratliff, Lillian},
  booktitle = 	 {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
  pages = 	 {620--629},
  year = 	 {2020},
  editor = 	 {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie},
  volume = 	 {120},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v120/zheng20a/zheng20a.pdf},
  url = 	 {https://proceedings.mlr.press/v120/zheng20a.html},
  abstract = 	 {Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the  constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.}
}

Endnote

%0 Conference Paper
%T Constrained Upper Confidence Reinforcement Learning
%A Liyuan Zheng
%A Lillian Ratliff
%B Proceedings of the 2nd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2020
%E Alexandre M. Bayen
%E Ali Jadbabaie
%E George Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire Tomlin
%E Melanie Zeilinger	
%F pmlr-v120-zheng20a
%I PMLR
%P 620--629
%U https://proceedings.mlr.press/v120/zheng20a.html
%V 120
%X Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the  constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.

APA


Zheng, L. & Ratliff, L.. (2020). Constrained Upper Confidence Reinforcement Learning. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:620-629 Available from https://proceedings.mlr.press/v120/zheng20a.html.

Related Material

Download PDF