Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Florin Gogianu; Tudor Berariu; Mihaela C Rosca; Claudia Clopath; Lucian Busoniu; Razvan Pascanu

Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Florin Gogianu, Tudor Berariu, Mihaela C Rosca, Claudia Clopath, Lucian Busoniu, Razvan Pascanu

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3734-3744, 2021.

Abstract

Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lipschitz constant of a single layer using spectral normalisation is sufficient to elevate the performance of a Categorical-DQN agent to that of a more elaborated agent on the challenging Atari domain. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-gogianu21a,
  title = 	 {Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective},
  author =       {Gogianu, Florin and Berariu, Tudor and Rosca, Mihaela C and Clopath, Claudia and Busoniu, Lucian and Pascanu, Razvan},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {3734--3744},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/gogianu21a/gogianu21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/gogianu21a.html},
  abstract = 	 {Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lipschitz constant of a single layer using spectral normalisation is sufficient to elevate the performance of a Categorical-DQN agent to that of a more elaborated agent on the challenging Atari domain. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.}
}

Endnote

%0 Conference Paper
%T Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective
%A Florin Gogianu
%A Tudor Berariu
%A Mihaela C Rosca
%A Claudia Clopath
%A Lucian Busoniu
%A Razvan Pascanu
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-gogianu21a
%I PMLR
%P 3734--3744
%U https://proceedings.mlr.press/v139/gogianu21a.html
%V 139
%X Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lipschitz constant of a single layer using spectral normalisation is sufficient to elevate the performance of a Categorical-DQN agent to that of a more elaborated agent on the challenging Atari domain. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.

APA

Gogianu, F., Berariu, T., Rosca, M.C., Clopath, C., Busoniu, L. & Pascanu, R.. (2021). Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3734-3744 Available from https://proceedings.mlr.press/v139/gogianu21a.html.

Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Abstract

Cite this Paper

Related Material