Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss, Mattia Rigotti, Irina Rish, Paolo Diachille, Viatcheslav Gurev, Brian Kingsbury, Ravi Tejwani, Djallel Bouneffouf
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1193-1202, 2019.

Abstract

Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-choromanska19a, title = {Beyond Backprop: Online Alternating Minimization with Auxiliary Variables}, author = {Choromanska, Anna and Cowen, Benjamin and Kumaravel, Sadhana and Luss, Ronny and Rigotti, Mattia and Rish, Irina and Diachille, Paolo and Gurev, Viatcheslav and Kingsbury, Brian and Tejwani, Ravi and Bouneffouf, Djallel}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1193--1202}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/choromanska19a/choromanska19a.pdf}, url = {https://proceedings.mlr.press/v97/choromanska19a.html}, abstract = {Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.} }
Endnote
%0 Conference Paper %T Beyond Backprop: Online Alternating Minimization with Auxiliary Variables %A Anna Choromanska %A Benjamin Cowen %A Sadhana Kumaravel %A Ronny Luss %A Mattia Rigotti %A Irina Rish %A Paolo Diachille %A Viatcheslav Gurev %A Brian Kingsbury %A Ravi Tejwani %A Djallel Bouneffouf %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-choromanska19a %I PMLR %P 1193--1202 %U https://proceedings.mlr.press/v97/choromanska19a.html %V 97 %X Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
APA
Choromanska, A., Cowen, B., Kumaravel, S., Luss, R., Rigotti, M., Rish, I., Diachille, P., Gurev, V., Kingsbury, B., Tejwani, R. & Bouneffouf, D.. (2019). Beyond Backprop: Online Alternating Minimization with Auxiliary Variables. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1193-1202 Available from https://proceedings.mlr.press/v97/choromanska19a.html.

Related Material