Convergence guarantees for a class of non-convex and non-smooth optimization problems
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2601-2610, 2018.
Non-convex optimization problems arise frequently in machine learning, including feature selection, structured matrix learning, mixture modeling, and neural network training. We consider the problem of finding critical points of a broad class of non-convex problems with non-smooth components. We analyze the behavior of two gradient-based methods—namely a sub-gradient method, and a proximal method. Our main results are to establish rates of convergence for general problems, and also exhibit faster rates for sub-analytic functions. As an application of our theory, we obtain a simplification of the popular CCCP algorithm, which retains all the desirable convergence properties of the original method, along with a significantly lower cost per iteration. We illustrate our methods and theory via application to the problems of best subset selection, robust estimation, and shape from shading reconstruction.