Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11173-11182, 2020.

Abstract

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains important examples such as ReLU neural networks and others with non-differentiable activation functions. First, we show that finding an epsilon-stationary point with first-order methods is impossible in finite time. Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions. We propose a series of randomized first-order methods and analyze their complexity of finding a (delta, epsilon)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on delta. Empirically, our methods perform well for training ReLU neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-zhang20p, title = {Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions}, author = {Zhang, Jingzhao and Lin, Hongzhou and Jegelka, Stefanie and Sra, Suvrit and Jadbabaie, Ali}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {11173--11182}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/zhang20p/zhang20p.pdf}, url = {https://proceedings.mlr.press/v119/zhang20p.html}, abstract = {We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains important examples such as ReLU neural networks and others with non-differentiable activation functions. First, we show that finding an epsilon-stationary point with first-order methods is impossible in finite time. Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions. We propose a series of randomized first-order methods and analyze their complexity of finding a (delta, epsilon)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on delta. Empirically, our methods perform well for training ReLU neural networks.} }
Endnote
%0 Conference Paper %T Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions %A Jingzhao Zhang %A Hongzhou Lin %A Stefanie Jegelka %A Suvrit Sra %A Ali Jadbabaie %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-zhang20p %I PMLR %P 11173--11182 %U https://proceedings.mlr.press/v119/zhang20p.html %V 119 %X We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains important examples such as ReLU neural networks and others with non-differentiable activation functions. First, we show that finding an epsilon-stationary point with first-order methods is impossible in finite time. Therefore, we introduce the notion of (delta, epsilon)-stationarity, a generalization that allows for a point to be within distance delta of an epsilon-stationary point and reduces to epsilon-stationarity for smooth functions. We propose a series of randomized first-order methods and analyze their complexity of finding a (delta, epsilon)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on delta. Empirically, our methods perform well for training ReLU neural networks.
APA
Zhang, J., Lin, H., Jegelka, S., Sra, S. & Jadbabaie, A.. (2020). Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11173-11182 Available from https://proceedings.mlr.press/v119/zhang20p.html.

Related Material