Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Timothy Mann; Shie Mannor

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Timothy Mann, Shie Mannor

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):127-135, 2014.

Abstract

We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-mann14,
  title = 	 {Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations},
  author = 	 {Mann, Timothy and Mannor, Shie},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {127--135},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/mann14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/mann14.html},
  abstract = 	 {We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.}
}

Endnote

%0 Conference Paper
%T Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations
%A Timothy Mann
%A Shie Mannor
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-mann14
%I PMLR
%P 127--135
%U https://proceedings.mlr.press/v32/mann14.html
%V 32
%N 1
%X We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.

RIS


TY  - CPAPER
TI  - Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations
AU  - Timothy Mann
AU  - Shie Mannor
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-mann14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 127
EP  - 135
L1  - http://proceedings.mlr.press/v32/mann14.pdf
UR  - https://proceedings.mlr.press/v32/mann14.html
AB  - We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.
ER  -

APA


Mann, T. & Mannor, S.. (2014). Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):127-135 Available from https://proceedings.mlr.press/v32/mann14.html.

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Abstract

Cite this Paper

Related Material