Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Charles O’Neill, Alim Gumran, David Klindt
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46877-46896, 2025.

Abstract

A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. Using compressed sensing theory, we prove that an SAE encoder is inherently insufficient for accurate sparse inference, even in solvable cases. We then decouple encoding and decoding processes to empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We demonstrate this generalises to SAEs applied to large language models, where more expressive encoders achieve greater interpretability. This work opens new avenues for understanding neural network representations and analysing large language model activations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-o-neill25a, title = {Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders}, author = {O'Neill, Charles and Gumran, Alim and Klindt, David}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {46877--46896}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/o-neill25a/o-neill25a.pdf}, url = {https://proceedings.mlr.press/v267/o-neill25a.html}, abstract = {A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. Using compressed sensing theory, we prove that an SAE encoder is inherently insufficient for accurate sparse inference, even in solvable cases. We then decouple encoding and decoding processes to empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We demonstrate this generalises to SAEs applied to large language models, where more expressive encoders achieve greater interpretability. This work opens new avenues for understanding neural network representations and analysing large language model activations.} }
Endnote
%0 Conference Paper %T Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders %A Charles O’Neill %A Alim Gumran %A David Klindt %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-o-neill25a %I PMLR %P 46877--46896 %U https://proceedings.mlr.press/v267/o-neill25a.html %V 267 %X A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. Using compressed sensing theory, we prove that an SAE encoder is inherently insufficient for accurate sparse inference, even in solvable cases. We then decouple encoding and decoding processes to empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We demonstrate this generalises to SAEs applied to large language models, where more expressive encoders achieve greater interpretability. This work opens new avenues for understanding neural network representations and analysing large language model activations.
APA
O’Neill, C., Gumran, A. & Klindt, D.. (2025). Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46877-46896 Available from https://proceedings.mlr.press/v267/o-neill25a.html.

Related Material