Constrained Reinforcement Learning via Policy Splitting

Haoxian Chen; Henry Lam; Fengpei Li; Amirhossein Meisami

Constrained Reinforcement Learning via Policy Splitting

Haoxian Chen, Henry Lam, Fengpei Li, Amirhossein Meisami

Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:209-224, 2020.

Abstract

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v129-chen20b,
  title = 	 {Constrained Reinforcement Learning via Policy Splitting},
  author =       {Chen, Haoxian and Lam, Henry and Li, Fengpei and Meisami, Amirhossein},
  booktitle = 	 {Proceedings of The 12th Asian Conference on Machine Learning},
  pages = 	 {209--224},
  year = 	 {2020},
  editor = 	 {Pan, Sinno Jialin and Sugiyama, Masashi},
  volume = 	 {129},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--20 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v129/chen20b/chen20b.pdf},
  url = 	 {https://proceedings.mlr.press/v129/chen20b.html},
  abstract = 	 {We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.}
}

Endnote

%0 Conference Paper
%T Constrained Reinforcement Learning via Policy Splitting
%A Haoxian Chen
%A Henry Lam
%A Fengpei Li
%A Amirhossein Meisami
%B Proceedings of The 12th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Sinno Jialin Pan
%E Masashi Sugiyama	
%F pmlr-v129-chen20b
%I PMLR
%P 209--224
%U https://proceedings.mlr.press/v129/chen20b.html
%V 129
%X We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.

APA


Chen, H., Lam, H., Li, F. & Meisami, A.. (2020). Constrained Reinforcement Learning via Policy Splitting. Proceedings of The 12th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 129:209-224 Available from https://proceedings.mlr.press/v129/chen20b.html.

Related Material

Download PDF