Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian; Shai Keynan; Sarit Kraus

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9301-9312, 2021.

Abstract

The Reinforcement Learning (RL) building blocks, i.e. $Q$-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the $Q$-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables’ underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-sarafian21a,
  title = 	 {Recomposing the Reinforcement Learning Building Blocks with Hypernetworks},
  author =       {Sarafian, Elad and Keynan, Shai and Kraus, Sarit},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {9301--9312},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/sarafian21a/sarafian21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/sarafian21a.html},
  abstract = 	 {The Reinforcement Learning (RL) building blocks, i.e. $Q$-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the $Q$-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables’ underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).}
}

Endnote

%0 Conference Paper
%T Recomposing the Reinforcement Learning Building Blocks with Hypernetworks
%A Elad Sarafian
%A Shai Keynan
%A Sarit Kraus
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-sarafian21a
%I PMLR
%P 9301--9312
%U https://proceedings.mlr.press/v139/sarafian21a.html
%V 139
%X The Reinforcement Learning (RL) building blocks, i.e. $Q$-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the $Q$-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables’ underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).

APA

Sarafian, E., Keynan, S. & Kraus, S.. (2021). Recomposing the Reinforcement Learning Building Blocks with Hypernetworks. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9301-9312 Available from https://proceedings.mlr.press/v139/sarafian21a.html.

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Abstract

Cite this Paper

Related Material