Per-Decision Option Discounting

Anna Harutyunyan; Peter Vrancx; Philippe Hamel; Ann Nowe; Doina Precup

Per-Decision Option Discounting

Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2644-2652, 2019.

Abstract

In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-harutyunyan19a,
  title = 	 {Per-Decision Option Discounting},
  author =       {Harutyunyan, Anna and Vrancx, Peter and Hamel, Philippe and Nowe, Ann and Precup, Doina},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2644--2652},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/harutyunyan19a/harutyunyan19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/harutyunyan19a.html},
  abstract = 	 {In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.}
}

Endnote

%0 Conference Paper
%T Per-Decision Option Discounting
%A Anna Harutyunyan
%A Peter Vrancx
%A Philippe Hamel
%A Ann Nowe
%A Doina Precup
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-harutyunyan19a
%I PMLR
%P 2644--2652
%U https://proceedings.mlr.press/v97/harutyunyan19a.html
%V 97
%X In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

APA


Harutyunyan, A., Vrancx, P., Hamel, P., Nowe, A. & Precup, D.. (2019). Per-Decision Option Discounting. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2644-2652 Available from https://proceedings.mlr.press/v97/harutyunyan19a.html.

Per-Decision Option Discounting

Abstract

Cite this Paper

Related Material