Model Consistency for Learning with Mirror-Stratifiable Regularizers
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1236-1244, 2019.
Low-complexity non-smooth convex regularizers are routinely used to impose some structure (such as sparsity or low-rank) on the coefficients for linear predictors in supervised learning. Model consistency consists then in selecting the correct structure (for instance support or rank) by regularized empirical risk minimization. It is known that model consistency holds under appropriate non-degeneracy conditions. However such conditions typically fail for highly correlated designs and it is observed that regularization methods tend to select larger models. In this work, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers. This class of regularizers encompasses the most well-known in the literature, including the L1 or trace norms. It brings into play a pair of primal-dual models, which in turn allows one to locate the structure of the solution using a specific dual certificate. We also show how this analysis is applicable to optimal solutions of the learning problem, and also to the iterates computed by a certain class of stochastic proximal-gradient algorithms.