Polynomial-Time Approximability of Constrained Reinforcement Learning

Jeremy Mcmahan

Polynomial-Time Approximability of Constrained Reinforcement Learning

Jeremy Mcmahan

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43417-43439, 2025.

Abstract

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-mcmahan25b,
  title = 	 {Polynomial-Time Approximability of Constrained Reinforcement Learning},
  author =       {Mcmahan, Jeremy},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {43417--43439},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mcmahan25b/mcmahan25b.pdf},
  url = 	 {https://proceedings.mlr.press/v267/mcmahan25b.html},
  abstract = 	 {We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.}
}

Endnote

%0 Conference Paper
%T Polynomial-Time Approximability of Constrained Reinforcement Learning
%A Jeremy Mcmahan
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-mcmahan25b
%I PMLR
%P 43417--43439
%U https://proceedings.mlr.press/v267/mcmahan25b.html
%V 267
%X We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

APA

Mcmahan, J.. (2025). Polynomial-Time Approximability of Constrained Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43417-43439 Available from https://proceedings.mlr.press/v267/mcmahan25b.html.

Polynomial-Time Approximability of Constrained Reinforcement Learning

Abstract

Cite this Paper

Related Material