Linear Explanations for Individual Neurons

Tuomas Oikarinen; Tsui-Wei Weng

Linear Explanations for Individual Neurons

Tuomas Oikarinen, Tsui-Wei Weng

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38639-38662, 2024.

Abstract

In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron’s causal effect. In addition, inputs causing lower activations are often very different and can’t be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-oikarinen24a,
  title = 	 {Linear Explanations for Individual Neurons},
  author =       {Oikarinen, Tuomas and Weng, Tsui-Wei},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {38639--38662},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/oikarinen24a/oikarinen24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/oikarinen24a.html},
  abstract = 	 {In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron’s causal effect. In addition, inputs causing lower activations are often very different and can’t be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.}
}

Endnote

%0 Conference Paper
%T Linear Explanations for Individual Neurons
%A Tuomas Oikarinen
%A Tsui-Wei Weng
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-oikarinen24a
%I PMLR
%P 38639--38662
%U https://proceedings.mlr.press/v235/oikarinen24a.html
%V 235
%X In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron’s causal effect. In addition, inputs causing lower activations are often very different and can’t be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.

APA


Oikarinen, T. & Weng, T.. (2024). Linear Explanations for Individual Neurons. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38639-38662 Available from https://proceedings.mlr.press/v235/oikarinen24a.html.

Linear Explanations for Individual Neurons

Abstract

Cite this Paper

Related Material