Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning

Jihun Yun, Peng Zheng, Eunho Yang, Aurelie Lozano, Aleksandr Aravkin
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7242-7251, 2019.

Abstract

We study high-dimensional estimators with the trimmed $\ell_1$ penalty, which leaves the h largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for M-estimation, and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter h. Our results show different regimes based on how h compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $\ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $\ell_1$ can outperform vanilla $\ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond M-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-yun19a, title = {Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning}, author = {Yun, Jihun and Zheng, Peng and Yang, Eunho and Lozano, Aurelie and Aravkin, Aleksandr}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7242--7251}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/yun19a/yun19a.pdf}, url = {https://proceedings.mlr.press/v97/yun19a.html}, abstract = {We study high-dimensional estimators with the trimmed $\ell_1$ penalty, which leaves the h largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for M-estimation, and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter h. Our results show different regimes based on how h compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $\ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $\ell_1$ can outperform vanilla $\ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond M-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.} }
Endnote
%0 Conference Paper %T Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning %A Jihun Yun %A Peng Zheng %A Eunho Yang %A Aurelie Lozano %A Aleksandr Aravkin %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-yun19a %I PMLR %P 7242--7251 %U https://proceedings.mlr.press/v97/yun19a.html %V 97 %X We study high-dimensional estimators with the trimmed $\ell_1$ penalty, which leaves the h largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for M-estimation, and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter h. Our results show different regimes based on how h compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $\ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $\ell_1$ can outperform vanilla $\ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond M-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.
APA
Yun, J., Zheng, P., Yang, E., Lozano, A. & Aravkin, A.. (2019). Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7242-7251 Available from https://proceedings.mlr.press/v97/yun19a.html.

Related Material