Tilting the playing field: Dynamical loss functions for machine learning

Miguel Ruiz-Garcia, Ge Zhang, Samuel S Schoenholz, Andrea J. Liu
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9157-9167, 2021.

Abstract

We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss. In particular, as the loss function oscillates, instabilities develop in the form of bifurcation cascades, which we study using the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and deepen, and then narrow and rise as the loss landscape changes during a cycle. As the landscape narrows, the learning rate becomes too large and the network becomes unstable and bounces around the valley. This process ultimately pushes the system into deeper and wider regions of the loss landscape and is characterized by decreasing eigenvalues of the Hessian. This results in better regularized models with improved generalization performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-ruiz-garcia21a, title = {Tilting the playing field: Dynamical loss functions for machine learning}, author = {Ruiz-Garcia, Miguel and Zhang, Ge and Schoenholz, Samuel S and Liu, Andrea J.}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9157--9167}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/ruiz-garcia21a/ruiz-garcia21a.pdf}, url = {https://proceedings.mlr.press/v139/ruiz-garcia21a.html}, abstract = {We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss. In particular, as the loss function oscillates, instabilities develop in the form of bifurcation cascades, which we study using the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and deepen, and then narrow and rise as the loss landscape changes during a cycle. As the landscape narrows, the learning rate becomes too large and the network becomes unstable and bounces around the valley. This process ultimately pushes the system into deeper and wider regions of the loss landscape and is characterized by decreasing eigenvalues of the Hessian. This results in better regularized models with improved generalization performance.} }
Endnote
%0 Conference Paper %T Tilting the playing field: Dynamical loss functions for machine learning %A Miguel Ruiz-Garcia %A Ge Zhang %A Samuel S Schoenholz %A Andrea J. Liu %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-ruiz-garcia21a %I PMLR %P 9157--9167 %U https://proceedings.mlr.press/v139/ruiz-garcia21a.html %V 139 %X We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss. In particular, as the loss function oscillates, instabilities develop in the form of bifurcation cascades, which we study using the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and deepen, and then narrow and rise as the loss landscape changes during a cycle. As the landscape narrows, the learning rate becomes too large and the network becomes unstable and bounces around the valley. This process ultimately pushes the system into deeper and wider regions of the loss landscape and is characterized by decreasing eigenvalues of the Hessian. This results in better regularized models with improved generalization performance.
APA
Ruiz-Garcia, M., Zhang, G., Schoenholz, S.S. & Liu, A.J.. (2021). Tilting the playing field: Dynamical loss functions for machine learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9157-9167 Available from https://proceedings.mlr.press/v139/ruiz-garcia21a.html.

Related Material