The Termination Critic

Anna Harutyunyan; Will Dabney; Diana Borsa; Nicolas Heess; Remi Munos; Doina Precup

The Termination Critic

Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2231-2240, 2019.

Abstract

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-harutyunyan19a,
  title = 	 {The Termination Critic},
  author =       {Harutyunyan, Anna and Dabney, Will and Borsa, Diana and Heess, Nicolas and Munos, Remi and Precup, Doina},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2231--2240},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/harutyunyan19a/harutyunyan19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/harutyunyan19a.html},
  abstract = 	 {In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination function, as opposed to  - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.}
}

Endnote

%0 Conference Paper
%T The Termination Critic
%A Anna Harutyunyan
%A Will Dabney
%A Diana Borsa
%A Nicolas Heess
%A Remi Munos
%A Doina Precup
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-harutyunyan19a
%I PMLR
%P 2231--2240
%U https://proceedings.mlr.press/v89/harutyunyan19a.html
%V 89
%X In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination function, as opposed to  - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

APA


Harutyunyan, A., Dabney, W., Borsa, D., Heess, N., Munos, R. & Precup, D.. (2019). The Termination Critic. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2231-2240 Available from https://proceedings.mlr.press/v89/harutyunyan19a.html.

The Termination Critic

Abstract

Cite this Paper

Related Material