Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers

Arnab Auddy; Haolin Zou; Kamiar Rahnamarad; Arian Maleki

Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers

Arnab Auddy, Haolin Zou, Kamiar Rahnamarad, Arian Maleki

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2377-2385, 2024.

Abstract

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO’s error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable

$\ell_1$ regularizer. We bound the error

$|{\rm ALO}-{\rm LO}|$ {in} terms of intuitive metrics such as the size of leave-

$i$ -out perturbations in active sets, sample size

$n$ , number of features

$p$ and signal-to-noise ratio (SNR). As a consequence, for the

$\ell_1$ regularized problems, we show that

$|{\rm ALO}-{\rm LO}| \stackrel{p\rightarrow \infty}{\longrightarrow} 0$ while

$n/p$ and SNR remain bounded.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-auddy24a,
  title = 	 {Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers},
  author =       {Auddy, Arnab and Zou, Haolin and Rahnamarad, Kamiar and Maleki, Arian},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2377--2385},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/auddy24a/auddy24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/auddy24a.html},
  abstract = 	 {The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO’s error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable $\ell_1$ regularizer. We bound the error \(|{\rm ALO}-{\rm LO}|\){in} terms of intuitive metrics such as the size of leave-\(i\)-out perturbations in active sets, sample size $n$, number of features $p$ and signal-to-noise ratio (SNR). As a consequence, for the $\ell_1$ regularized problems, we show that $|{\rm ALO}-{\rm LO}| \stackrel{p\rightarrow \infty}{\longrightarrow} 0$ while $n/p$ and SNR remain bounded.}
}

Endnote

%0 Conference Paper
%T Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers
%A Arnab Auddy
%A Haolin Zou
%A Kamiar Rahnamarad
%A Arian Maleki
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-auddy24a
%I PMLR
%P 2377--2385
%U https://proceedings.mlr.press/v238/auddy24a.html
%V 238
%X The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO’s error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable $\ell_1$ regularizer. We bound the error \(|{\rm ALO}-{\rm LO}|\){in} terms of intuitive metrics such as the size of leave-\(i\)-out perturbations in active sets, sample size $n$, number of features $p$ and signal-to-noise ratio (SNR). As a consequence, for the $\ell_1$ regularized problems, we show that $|{\rm ALO}-{\rm LO}| \stackrel{p\rightarrow \infty}{\longrightarrow} 0$ while $n/p$ and SNR remain bounded.

APA

Auddy, A., Zou, H., Rahnamarad, K. & Maleki, A.. (2024). Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2377-2385 Available from https://proceedings.mlr.press/v238/auddy24a.html.

Related Material

Download PDF

Approximate Leave-one-out Cross Validation for Regression with ℓ1\ell_1 Regularizers

Abstract

Cite this Paper

Related Material

Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers