Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Timothy Mann, Shie Mannor
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):127-135, 2014.

Abstract

We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-mann14, title = {Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations}, author = {Mann, Timothy and Mannor, Shie}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {127--135}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/mann14.pdf}, url = {https://proceedings.mlr.press/v32/mann14.html}, abstract = {We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.} }
Endnote
%0 Conference Paper %T Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations %A Timothy Mann %A Shie Mannor %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-mann14 %I PMLR %P 127--135 %U https://proceedings.mlr.press/v32/mann14.html %V 32 %N 1 %X We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.
RIS
TY - CPAPER TI - Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations AU - Timothy Mann AU - Shie Mannor BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-mann14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 1 SP - 127 EP - 135 L1 - http://proceedings.mlr.press/v32/mann14.pdf UR - https://proceedings.mlr.press/v32/mann14.html AB - We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations. ER -
APA
Mann, T. & Mannor, S.. (2014). Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):127-135 Available from https://proceedings.mlr.press/v32/mann14.html.

Related Material