Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Xiaohan Zhu; Nathan Srebro

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Xiaohan Zhu, Nathan Srebro

Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:6124-6155, 2025.

Abstract

We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v291-zhu25a,
  title = 	 {Quantifying Overfitting along the Regularization Path for Two-Part-Code {MDL} in Supervised Classification},
  author =       {Zhu, Xiaohan and Srebro, Nathan},
  booktitle = 	 {Proceedings of Thirty Eighth Conference on Learning Theory},
  pages = 	 {6124--6155},
  year = 	 {2025},
  editor = 	 {Haghtalab, Nika and Moitra, Ankur},
  volume = 	 {291},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--04 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v291/main/assets/zhu25a/zhu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v291/zhu25a.html},
  abstract = 	 {We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.}
}

Endnote

%0 Conference Paper
%T Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification
%A Xiaohan Zhu
%A Nathan Srebro
%B Proceedings of Thirty Eighth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Nika Haghtalab
%E Ankur Moitra	
%F pmlr-v291-zhu25a
%I PMLR
%P 6124--6155
%U https://proceedings.mlr.press/v291/zhu25a.html
%V 291
%X We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.

APA

Zhu, X. & Srebro, N.. (2025). Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:6124-6155 Available from https://proceedings.mlr.press/v291/zhu25a.html.

Related Material

Download PDF