Quantifying Memory Utilization with Effective State-Size

Rom Parnichkun, Neehal Tumma, Armin W Thomas, Alessandro Moro, Qi An, Taiji Suzuki, Atsushi Yamashita, Michael Poli, Stefano Massaroli
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:48276-48334, 2025.

Abstract

As the space of causal sequence modeling architectures continues to grow, the need to develop a general framework for their analysis becomes increasingly important. With this aim, we draw insights from classical signal processing and control theory, to develop a quantitative measure of memory utilization: the internal mechanisms through which a model stores past information to produce future outputs. This metric, which we call effective state-size (ESS), is tailored to the fundamental class of systems with input-invariant and input-varying linear operators, encompassing a variety of computational units such as variants of attention, convolutions, and recurrences. Unlike prior work on memory utilization, which either relies on raw operator visualizations (e.g. attention maps), or simply the total memory capacity (i.e. cache size) of a model, our metrics provide highly interpretable and actionable measurements. In particular, we show how ESS can be leveraged to improve initialization strategies, inform novel regularizers and advance the performance-efficiency frontier through model distillation. Furthermore, we demonstrate that the effect of context delimiters (such as end-of-speech tokens) on ESS highlights cross-architectural differences in how large language models utilize their available memory to recall information. Overall, we find that ESS provides valuable insights into the dynamics that dictate memory utilization, enabling the design of more efficient and effective sequence models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-parnichkun25a, title = {Quantifying Memory Utilization with Effective State-Size}, author = {Parnichkun, Rom and Tumma, Neehal and Thomas, Armin W and Moro, Alessandro and An, Qi and Suzuki, Taiji and Yamashita, Atsushi and Poli, Michael and Massaroli, Stefano}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {48276--48334}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/parnichkun25a/parnichkun25a.pdf}, url = {https://proceedings.mlr.press/v267/parnichkun25a.html}, abstract = {As the space of causal sequence modeling architectures continues to grow, the need to develop a general framework for their analysis becomes increasingly important. With this aim, we draw insights from classical signal processing and control theory, to develop a quantitative measure of memory utilization: the internal mechanisms through which a model stores past information to produce future outputs. This metric, which we call effective state-size (ESS), is tailored to the fundamental class of systems with input-invariant and input-varying linear operators, encompassing a variety of computational units such as variants of attention, convolutions, and recurrences. Unlike prior work on memory utilization, which either relies on raw operator visualizations (e.g. attention maps), or simply the total memory capacity (i.e. cache size) of a model, our metrics provide highly interpretable and actionable measurements. In particular, we show how ESS can be leveraged to improve initialization strategies, inform novel regularizers and advance the performance-efficiency frontier through model distillation. Furthermore, we demonstrate that the effect of context delimiters (such as end-of-speech tokens) on ESS highlights cross-architectural differences in how large language models utilize their available memory to recall information. Overall, we find that ESS provides valuable insights into the dynamics that dictate memory utilization, enabling the design of more efficient and effective sequence models.} }
Endnote
%0 Conference Paper %T Quantifying Memory Utilization with Effective State-Size %A Rom Parnichkun %A Neehal Tumma %A Armin W Thomas %A Alessandro Moro %A Qi An %A Taiji Suzuki %A Atsushi Yamashita %A Michael Poli %A Stefano Massaroli %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-parnichkun25a %I PMLR %P 48276--48334 %U https://proceedings.mlr.press/v267/parnichkun25a.html %V 267 %X As the space of causal sequence modeling architectures continues to grow, the need to develop a general framework for their analysis becomes increasingly important. With this aim, we draw insights from classical signal processing and control theory, to develop a quantitative measure of memory utilization: the internal mechanisms through which a model stores past information to produce future outputs. This metric, which we call effective state-size (ESS), is tailored to the fundamental class of systems with input-invariant and input-varying linear operators, encompassing a variety of computational units such as variants of attention, convolutions, and recurrences. Unlike prior work on memory utilization, which either relies on raw operator visualizations (e.g. attention maps), or simply the total memory capacity (i.e. cache size) of a model, our metrics provide highly interpretable and actionable measurements. In particular, we show how ESS can be leveraged to improve initialization strategies, inform novel regularizers and advance the performance-efficiency frontier through model distillation. Furthermore, we demonstrate that the effect of context delimiters (such as end-of-speech tokens) on ESS highlights cross-architectural differences in how large language models utilize their available memory to recall information. Overall, we find that ESS provides valuable insights into the dynamics that dictate memory utilization, enabling the design of more efficient and effective sequence models.
APA
Parnichkun, R., Tumma, N., Thomas, A.W., Moro, A., An, Q., Suzuki, T., Yamashita, A., Poli, M. & Massaroli, S.. (2025). Quantifying Memory Utilization with Effective State-Size. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:48276-48334 Available from https://proceedings.mlr.press/v267/parnichkun25a.html.

Related Material