Decoupled Neural Interfaces using Synthetic Gradients

Max Jaderberg; Wojciech Marian Czarnecki; Simon Osindero; Oriol Vinyals; Alex Graves; David Silver; Koray Kavukcuoglu

Decoupled Neural Interfaces using Synthetic Gradients

Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1627-1635, 2017.

Abstract

Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-jaderberg17a,
  title = 	 {Decoupled Neural Interfaces using Synthetic Gradients},
  author =       {Max Jaderberg and Wojciech Marian Czarnecki and Simon Osindero and Oriol Vinyals and Alex Graves and David Silver and Koray Kavukcuoglu},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {1627--1635},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/jaderberg17a/jaderberg17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/jaderberg17a.html},
  abstract = 	 {Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.}
}

Endnote

%0 Conference Paper
%T Decoupled Neural Interfaces using Synthetic Gradients
%A Max Jaderberg
%A Wojciech Marian Czarnecki
%A Simon Osindero
%A Oriol Vinyals
%A Alex Graves
%A David Silver
%A Koray Kavukcuoglu
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-jaderberg17a
%I PMLR
%P 1627--1635
%U https://proceedings.mlr.press/v70/jaderberg17a.html
%V 70
%X Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

APA


Jaderberg, M., Czarnecki, W.M., Osindero, S., Vinyals, O., Graves, A., Silver, D. & Kavukcuoglu, K.. (2017). Decoupled Neural Interfaces using Synthetic Gradients. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1627-1635 Available from https://proceedings.mlr.press/v70/jaderberg17a.html.

Decoupled Neural Interfaces using Synthetic Gradients

Abstract

Cite this Paper

Related Material