On the Flatness of Loss Surface for Two-layered ReLU Networks

Jiezhang Cao, Qingyao Wu, Yuguang Yan, Li Wang, Mingkui Tan
Proceedings of the Ninth Asian Conference on Machine Learning, PMLR 77:545-560, 2017.

Abstract

Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v77-cao17a, title = {On the Flatness of Loss Surface for Two-layered ReLU Networks}, author = {Cao, Jiezhang and Wu, Qingyao and Yan, Yuguang and Wang, Li and Tan, Mingkui}, booktitle = {Proceedings of the Ninth Asian Conference on Machine Learning}, pages = {545--560}, year = {2017}, editor = {Zhang, Min-Ling and Noh, Yung-Kyun}, volume = {77}, series = {Proceedings of Machine Learning Research}, address = {Yonsei University, Seoul, Republic of Korea}, month = {15--17 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v77/cao17a/cao17a.pdf}, url = {https://proceedings.mlr.press/v77/cao17a.html}, abstract = {Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.} }
Endnote
%0 Conference Paper %T On the Flatness of Loss Surface for Two-layered ReLU Networks %A Jiezhang Cao %A Qingyao Wu %A Yuguang Yan %A Li Wang %A Mingkui Tan %B Proceedings of the Ninth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Min-Ling Zhang %E Yung-Kyun Noh %F pmlr-v77-cao17a %I PMLR %P 545--560 %U https://proceedings.mlr.press/v77/cao17a.html %V 77 %X Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.
APA
Cao, J., Wu, Q., Yan, Y., Wang, L. & Tan, M.. (2017). On the Flatness of Loss Surface for Two-layered ReLU Networks. Proceedings of the Ninth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 77:545-560 Available from https://proceedings.mlr.press/v77/cao17a.html.

Related Material