Learning Routines for Effective Off-Policy Reinforcement Learning

Edoardo Cetin; Oya Celiktutan

Learning Routines for Effective Off-Policy Reinforcement Learning

Edoardo Cetin, Oya Celiktutan

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1384-1394, 2021.

Abstract

The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of ’equivalent’ sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-cetin21a,
  title = 	 {Learning Routines for Effective Off-Policy Reinforcement Learning},
  author =       {Cetin, Edoardo and Celiktutan, Oya},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {1384--1394},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/cetin21a/cetin21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/cetin21a.html},
  abstract = 	 {The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of ’equivalent’ sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.}
}

Endnote

%0 Conference Paper
%T Learning Routines for Effective Off-Policy Reinforcement Learning
%A Edoardo Cetin
%A Oya Celiktutan
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-cetin21a
%I PMLR
%P 1384--1394
%U https://proceedings.mlr.press/v139/cetin21a.html
%V 139
%X The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of ’equivalent’ sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.

APA

Cetin, E. & Celiktutan, O.. (2021). Learning Routines for Effective Off-Policy Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1384-1394 Available from https://proceedings.mlr.press/v139/cetin21a.html.

Learning Routines for Effective Off-Policy Reinforcement Learning

Abstract

Cite this Paper

Related Material