Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Zihao Deng; Siddartha Devic; Brendan Juba

Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Zihao Deng, Siddartha Devic, Brendan Juba

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:11280-11304, 2022.

Abstract

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-deng22c,
  title = 	 { Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions },
  author =       {Deng, Zihao and Devic, Siddartha and Juba, Brendan},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {11280--11304},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/deng22c/deng22c.pdf},
  url = 	 {https://proceedings.mlr.press/v151/deng22c.html},
  abstract = 	 { Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent. }
}

Endnote

%0 Conference Paper
%T  Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions 
%A Zihao Deng
%A Siddartha Devic
%A Brendan Juba
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-deng22c
%I PMLR
%P 11280--11304
%U https://proceedings.mlr.press/v151/deng22c.html
%V 151
%X  Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

APA


Deng, Z., Devic, S. & Juba, B.. (2022).  Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:11280-11304 Available from https://proceedings.mlr.press/v151/deng22c.html.

Related Material

Download PDF