Optimistic planning for Markov decision processes

Lucian Busoniu; Remi Munos

Optimistic planning for Markov decision processes

Lucian Busoniu, Remi Munos

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:182-189, 2012.

Abstract

The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-busoniu12,
  title = 	 {Optimistic planning for Markov decision processes},
  author = 	 {Busoniu, Lucian and Munos, Remi},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {182--189},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/busoniu12/busoniu12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/busoniu12.html},
  abstract = 	 {The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities.}
}

Endnote

%0 Conference Paper
%T Optimistic planning for Markov decision processes
%A Lucian Busoniu
%A Remi Munos
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-busoniu12
%I PMLR
%P 182--189
%U https://proceedings.mlr.press/v22/busoniu12.html
%V 22
%X The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities.

RIS


TY  - CPAPER
TI  - Optimistic planning for Markov decision processes
AU  - Lucian Busoniu
AU  - Remi Munos
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-busoniu12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 182
EP  - 189
L1  - http://proceedings.mlr.press/v22/busoniu12/busoniu12.pdf
UR  - https://proceedings.mlr.press/v22/busoniu12.html
AB  - The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities.
ER  -

APA


Busoniu, L. & Munos, R.. (2012). Optimistic planning for Markov decision processes. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:182-189 Available from https://proceedings.mlr.press/v22/busoniu12.html.

Optimistic planning for Markov decision processes

Abstract

Cite this Paper

Related Material