Decomposing and Editing Predictions by Modeling Model Computation

Harshay Shah, Andrew Ilyas, Aleksander Madry
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44244-44292, 2024.

Abstract

How does the internal computation of a machine learning model transform inputs into predictions? To tackle this question, we introduce a framework called component modeling for decomposing a model prediction in terms of its components—architectural "building blocks" such as convolution filters or attention heads. We focus on a special case of this framework, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions, and demonstrate its effectiveness across models, datasets and modalities. Finally, we show that COAR directly enables effective model editing. Our code is available at github.com/MadryLab/modelcomponents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-shah24a, title = {Decomposing and Editing Predictions by Modeling Model Computation}, author = {Shah, Harshay and Ilyas, Andrew and Madry, Aleksander}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {44244--44292}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/shah24a/shah24a.pdf}, url = {https://proceedings.mlr.press/v235/shah24a.html}, abstract = {How does the internal computation of a machine learning model transform inputs into predictions? To tackle this question, we introduce a framework called component modeling for decomposing a model prediction in terms of its components—architectural "building blocks" such as convolution filters or attention heads. We focus on a special case of this framework, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions, and demonstrate its effectiveness across models, datasets and modalities. Finally, we show that COAR directly enables effective model editing. Our code is available at github.com/MadryLab/modelcomponents.} }
Endnote
%0 Conference Paper %T Decomposing and Editing Predictions by Modeling Model Computation %A Harshay Shah %A Andrew Ilyas %A Aleksander Madry %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-shah24a %I PMLR %P 44244--44292 %U https://proceedings.mlr.press/v235/shah24a.html %V 235 %X How does the internal computation of a machine learning model transform inputs into predictions? To tackle this question, we introduce a framework called component modeling for decomposing a model prediction in terms of its components—architectural "building blocks" such as convolution filters or attention heads. We focus on a special case of this framework, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions, and demonstrate its effectiveness across models, datasets and modalities. Finally, we show that COAR directly enables effective model editing. Our code is available at github.com/MadryLab/modelcomponents.
APA
Shah, H., Ilyas, A. & Madry, A.. (2024). Decomposing and Editing Predictions by Modeling Model Computation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44244-44292 Available from https://proceedings.mlr.press/v235/shah24a.html.

Related Material