Constrained Offline Policy Optimization

Nicholas Polosky; Bruno C. Da Silva; Madalina Fiterau; Jithin Jagannath

Constrained Offline Policy Optimization

Nicholas Polosky, Bruno C. Da Silva, Madalina Fiterau, Jithin Jagannath

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:17801-17810, 2022.

Abstract

In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-polosky22a,
  title = 	 {Constrained Offline Policy Optimization},
  author =       {Polosky, Nicholas and Silva, Bruno C. Da and Fiterau, Madalina and Jagannath, Jithin},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {17801--17810},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/polosky22a/polosky22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/polosky22a.html},
  abstract = 	 {In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.}
}

Endnote

%0 Conference Paper
%T Constrained Offline Policy Optimization
%A Nicholas Polosky
%A Bruno C. Da Silva
%A Madalina Fiterau
%A Jithin Jagannath
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-polosky22a
%I PMLR
%P 17801--17810
%U https://proceedings.mlr.press/v162/polosky22a.html
%V 162
%X In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.

APA


Polosky, N., Silva, B.C.D., Fiterau, M. & Jagannath, J.. (2022). Constrained Offline Policy Optimization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:17801-17810 Available from https://proceedings.mlr.press/v162/polosky22a.html.

Constrained Offline Policy Optimization

Abstract

Cite this Paper

Related Material