Grokking Group Multiplication with Cosets

Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46441-46467, 2024.

Abstract

The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have “grokked” the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group’s subgroups. We relate how we reverse engineered the model’s mechanisms and confirmed our theory was a faithful description of the circuit’s functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. (2023) which alleges to find a different algorithm for this same problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-stander24a, title = {Grokking Group Multiplication with Cosets}, author = {Stander, Dashiell and Yu, Qinan and Fan, Honglu and Biderman, Stella}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46441--46467}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/stander24a/stander24a.pdf}, url = {https://proceedings.mlr.press/v235/stander24a.html}, abstract = {The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have “grokked” the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group’s subgroups. We relate how we reverse engineered the model’s mechanisms and confirmed our theory was a faithful description of the circuit’s functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. (2023) which alleges to find a different algorithm for this same problem.} }
Endnote
%0 Conference Paper %T Grokking Group Multiplication with Cosets %A Dashiell Stander %A Qinan Yu %A Honglu Fan %A Stella Biderman %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-stander24a %I PMLR %P 46441--46467 %U https://proceedings.mlr.press/v235/stander24a.html %V 235 %X The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have “grokked” the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group’s subgroups. We relate how we reverse engineered the model’s mechanisms and confirmed our theory was a faithful description of the circuit’s functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. (2023) which alleges to find a different algorithm for this same problem.
APA
Stander, D., Yu, Q., Fan, H. & Biderman, S.. (2024). Grokking Group Multiplication with Cosets. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46441-46467 Available from https://proceedings.mlr.press/v235/stander24a.html.

Related Material