Position: Understanding LLMs Requires More Than Statistical Generalization

Patrik Reizinger; Szilvia Ujváry; Anna Mészáros; Anna Kerekes; Wieland Brendel; Ferenc Huszár

Position: Understanding LLMs Requires More Than Statistical Generalization

Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:42365-42390, 2024.

Abstract

The last decade has seen blossoming research in deep learning theory attempting to answer, “Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart—thus, equivalent test loss—can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-reizinger24a,
  title = 	 {Position: Understanding {LLM}s Requires More Than Statistical Generalization},
  author =       {Reizinger, Patrik and Ujv\'{a}ry, Szilvia and M\'{e}sz\'{a}ros, Anna and Kerekes, Anna and Brendel, Wieland and Husz\'{a}r, Ferenc},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {42365--42390},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/reizinger24a/reizinger24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/reizinger24a.html},
  abstract = 	 {The last decade has seen blossoming research in deep learning theory attempting to answer, “Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart—thus, equivalent test loss—can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.}
}

Endnote

%0 Conference Paper
%T Position: Understanding LLMs Requires More Than Statistical Generalization
%A Patrik Reizinger
%A Szilvia Ujváry
%A Anna Mészáros
%A Anna Kerekes
%A Wieland Brendel
%A Ferenc Huszár
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-reizinger24a
%I PMLR
%P 42365--42390
%U https://proceedings.mlr.press/v235/reizinger24a.html
%V 235
%X The last decade has seen blossoming research in deep learning theory attempting to answer, “Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart—thus, equivalent test loss—can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

APA


Reizinger, P., Ujváry, S., Mészáros, A., Kerekes, A., Brendel, W. & Huszár, F.. (2024). Position: Understanding LLMs Requires More Than Statistical Generalization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:42365-42390 Available from https://proceedings.mlr.press/v235/reizinger24a.html.

Position: Understanding LLMs Requires More Than Statistical Generalization

Abstract

Cite this Paper

Related Material