Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2294-2340, 2019.

Abstract

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as $O(1/\sqrt{n})$ from $n$ observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their second-order derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-marteau-ferey19a, title = {Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance}, author = {Marteau{-}Ferey, Ulysse and Ostrovskii, Dmitrii and Bach, Francis and Rudi, Alessandro}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {2294--2340}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/marteau-ferey19a/marteau-ferey19a.pdf}, url = {https://proceedings.mlr.press/v99/marteau-ferey19a.html}, abstract = {We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as $O(1/\sqrt{n})$ from $n$ observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their second-order derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.} }
Endnote
%0 Conference Paper %T Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance %A Ulysse Marteau-Ferey %A Dmitrii Ostrovskii %A Francis Bach %A Alessandro Rudi %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-marteau-ferey19a %I PMLR %P 2294--2340 %U https://proceedings.mlr.press/v99/marteau-ferey19a.html %V 99 %X We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as $O(1/\sqrt{n})$ from $n$ observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their second-order derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.
APA
Marteau-Ferey, U., Ostrovskii, D., Bach, F. & Rudi, A.. (2019). Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2294-2340 Available from https://proceedings.mlr.press/v99/marteau-ferey19a.html.

Related Material