Practical Gauss-Newton Optimisation for Deep Learning

Aleksandar Botev, Hippolyt Ritter, David Barber
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:557-565, 2017.

Abstract

We present an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks. Our resulting algorithm is competitive against state-of-the-art first-order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a laborious process, our approach can provide good performance even when used with default settings. A side result of our work is that for piecewise linear transfer functions, the network objective function can have no differentiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-botev17a, title = {Practical {G}auss-{N}ewton Optimisation for Deep Learning}, author = {Aleksandar Botev and Hippolyt Ritter and David Barber}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {557--565}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/botev17a/botev17a.pdf}, url = {https://proceedings.mlr.press/v70/botev17a.html}, abstract = {We present an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks. Our resulting algorithm is competitive against state-of-the-art first-order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a laborious process, our approach can provide good performance even when used with default settings. A side result of our work is that for piecewise linear transfer functions, the network objective function can have no differentiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.} }
Endnote
%0 Conference Paper %T Practical Gauss-Newton Optimisation for Deep Learning %A Aleksandar Botev %A Hippolyt Ritter %A David Barber %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-botev17a %I PMLR %P 557--565 %U https://proceedings.mlr.press/v70/botev17a.html %V 70 %X We present an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks. Our resulting algorithm is competitive against state-of-the-art first-order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a laborious process, our approach can provide good performance even when used with default settings. A side result of our work is that for piecewise linear transfer functions, the network objective function can have no differentiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.
APA
Botev, A., Ritter, H. & Barber, D.. (2017). Practical Gauss-Newton Optimisation for Deep Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:557-565 Available from https://proceedings.mlr.press/v70/botev17a.html.

Related Material