Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Xiaohan Zhu, Nathan Srebro
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:6124-6155, 2025.

Abstract

We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-zhu25a, title = {Quantifying Overfitting along the Regularization Path for Two-Part-Code {MDL} in Supervised Classification}, author = {Zhu, Xiaohan and Srebro, Nathan}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {6124--6155}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/zhu25a/zhu25a.pdf}, url = {https://proceedings.mlr.press/v291/zhu25a.html}, abstract = {We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.} }
Endnote
%0 Conference Paper %T Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification %A Xiaohan Zhu %A Nathan Srebro %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-zhu25a %I PMLR %P 6124--6155 %U https://proceedings.mlr.press/v291/zhu25a.html %V 291 %X We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.
APA
Zhu, X. & Srebro, N.. (2025). Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:6124-6155 Available from https://proceedings.mlr.press/v291/zhu25a.html.

Related Material