Zigzag Persistence of Large Language Models Representations

Yuri Gardinazzi, Karthik Viswanathan, Giada Panerai, Alessio Ansuini, Alberto Cazzaniga, Matteo Biagetti
Proceedings of the Geometry, Topology, and Machine Learning Workshop, PMLR 325:120-129, 2026.

Abstract

We analyze internal representations of large language models with zigzag persistent homology, treating depth as a discrete time axis for point clouds of last-token embeddings. At each layer we build a k-nearest-neighbors clique complex, connect adjacent layers via intersections, and summarize the resulting diagrams with effective persistence images. From these we derive two descriptors: Births’ Relative Frequency (at what rate new p-dimensional features appear) and Inter-Layer Persistence (how long they survive across depth). On the SST movie reviews dataset and three open-source models (Llama-3.1, OSS-20B, Phi-4), we consistently observe three evolving phases: early rapid changes, a middle regime of stable organization, and a final reorganization before output. Using the stability signal (inter-layer persistence) to guide where to remove contiguous blocks of layers, we find that pruning within high-persistence regions maintains 5-shot MMLU performance (with the same trend visible even for the more pruning-sensitive OSS-20B). This suggests that zigzag-based summaries capture meaningful, system-level dynamics and can inform lightweight pruning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v325-gardinazzi26a, title = {Zigzag Persistence of Large Language Models Representations}, author = {Gardinazzi, Yuri and Viswanathan, Karthik and Panerai, Giada and Ansuini, Alessio and Cazzaniga, Alberto and Biagetti, Matteo}, booktitle = {Proceedings of the Geometry, Topology, and Machine Learning Workshop}, pages = {120--129}, year = {2026}, editor = {Bleher, Michael and Jensen, Freya and Maier, Levin and Taha, Diaaeldin and Wienhard, Anna}, volume = {325}, series = {Proceedings of Machine Learning Research}, month = {10--14 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v325/main/assets/gardinazzi26a/gardinazzi26a.pdf}, url = {https://proceedings.mlr.press/v325/gardinazzi26a.html}, abstract = {We analyze internal representations of large language models with zigzag persistent homology, treating depth as a discrete time axis for point clouds of last-token embeddings. At each layer we build a k-nearest-neighbors clique complex, connect adjacent layers via intersections, and summarize the resulting diagrams with effective persistence images. From these we derive two descriptors: Births’ Relative Frequency (at what rate new p-dimensional features appear) and Inter-Layer Persistence (how long they survive across depth). On the SST movie reviews dataset and three open-source models (Llama-3.1, OSS-20B, Phi-4), we consistently observe three evolving phases: early rapid changes, a middle regime of stable organization, and a final reorganization before output. Using the stability signal (inter-layer persistence) to guide where to remove contiguous blocks of layers, we find that pruning within high-persistence regions maintains 5-shot MMLU performance (with the same trend visible even for the more pruning-sensitive OSS-20B). This suggests that zigzag-based summaries capture meaningful, system-level dynamics and can inform lightweight pruning.} }
Endnote
%0 Conference Paper %T Zigzag Persistence of Large Language Models Representations %A Yuri Gardinazzi %A Karthik Viswanathan %A Giada Panerai %A Alessio Ansuini %A Alberto Cazzaniga %A Matteo Biagetti %B Proceedings of the Geometry, Topology, and Machine Learning Workshop %C Proceedings of Machine Learning Research %D 2026 %E Michael Bleher %E Freya Jensen %E Levin Maier %E Diaaeldin Taha %E Anna Wienhard %F pmlr-v325-gardinazzi26a %I PMLR %P 120--129 %U https://proceedings.mlr.press/v325/gardinazzi26a.html %V 325 %X We analyze internal representations of large language models with zigzag persistent homology, treating depth as a discrete time axis for point clouds of last-token embeddings. At each layer we build a k-nearest-neighbors clique complex, connect adjacent layers via intersections, and summarize the resulting diagrams with effective persistence images. From these we derive two descriptors: Births’ Relative Frequency (at what rate new p-dimensional features appear) and Inter-Layer Persistence (how long they survive across depth). On the SST movie reviews dataset and three open-source models (Llama-3.1, OSS-20B, Phi-4), we consistently observe three evolving phases: early rapid changes, a middle regime of stable organization, and a final reorganization before output. Using the stability signal (inter-layer persistence) to guide where to remove contiguous blocks of layers, we find that pruning within high-persistence regions maintains 5-shot MMLU performance (with the same trend visible even for the more pruning-sensitive OSS-20B). This suggests that zigzag-based summaries capture meaningful, system-level dynamics and can inform lightweight pruning.
APA
Gardinazzi, Y., Viswanathan, K., Panerai, G., Ansuini, A., Cazzaniga, A. & Biagetti, M.. (2026). Zigzag Persistence of Large Language Models Representations. Proceedings of the Geometry, Topology, and Machine Learning Workshop, in Proceedings of Machine Learning Research 325:120-129 Available from https://proceedings.mlr.press/v325/gardinazzi26a.html.

Related Material