Constrained Reinforcement Learning via Policy Splitting

Haoxian Chen, Henry Lam, Fengpei Li, Amirhossein Meisami
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:209-224, 2020.

Abstract

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v129-chen20b, title = {Constrained Reinforcement Learning via Policy Splitting}, author = {Chen, Haoxian and Lam, Henry and Li, Fengpei and Meisami, Amirhossein}, booktitle = {Proceedings of The 12th Asian Conference on Machine Learning}, pages = {209--224}, year = {2020}, editor = {Pan, Sinno Jialin and Sugiyama, Masashi}, volume = {129}, series = {Proceedings of Machine Learning Research}, month = {18--20 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v129/chen20b/chen20b.pdf}, url = {https://proceedings.mlr.press/v129/chen20b.html}, abstract = {We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.} }
Endnote
%0 Conference Paper %T Constrained Reinforcement Learning via Policy Splitting %A Haoxian Chen %A Henry Lam %A Fengpei Li %A Amirhossein Meisami %B Proceedings of The 12th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Sinno Jialin Pan %E Masashi Sugiyama %F pmlr-v129-chen20b %I PMLR %P 209--224 %U https://proceedings.mlr.press/v129/chen20b.html %V 129 %X We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.
APA
Chen, H., Lam, H., Li, F. & Meisami, A.. (2020). Constrained Reinforcement Learning via Policy Splitting. Proceedings of The 12th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 129:209-224 Available from https://proceedings.mlr.press/v129/chen20b.html.

Related Material