What Mechanisms Does Knowledge Distillation Distill?

Cindy Wu, Ekdeep Singh Lubana, Bruno Kacper Mlodozeniec, Robert Kirk, David Krueger
Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, PMLR 243:60-75, 2024.

Abstract

Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is unclear if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v243-wu24a, title = {What Mechanisms Does Knowledge Distillation Distill?}, author = {Wu, Cindy and Lubana, Ekdeep Singh and Mlodozeniec, Bruno Kacper and Kirk, Robert and Krueger, David}, booktitle = {Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models}, pages = {60--75}, year = {2024}, editor = {Fumero, Marco and Rodolá, Emanuele and Domine, Clementine and Locatello, Francesco and Dziugaite, Karolina and Mathilde, Caron}, volume = {243}, series = {Proceedings of Machine Learning Research}, month = {15 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v243/wu24a/wu24a.pdf}, url = {https://proceedings.mlr.press/v243/wu24a.html}, abstract = {Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is unclear if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.} }
Endnote
%0 Conference Paper %T What Mechanisms Does Knowledge Distillation Distill? %A Cindy Wu %A Ekdeep Singh Lubana %A Bruno Kacper Mlodozeniec %A Robert Kirk %A David Krueger %B Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Emanuele Rodolá %E Clementine Domine %E Francesco Locatello %E Karolina Dziugaite %E Caron Mathilde %F pmlr-v243-wu24a %I PMLR %P 60--75 %U https://proceedings.mlr.press/v243/wu24a.html %V 243 %X Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is unclear if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.
APA
Wu, C., Lubana, E.S., Mlodozeniec, B.K., Kirk, R. & Krueger, D.. (2024). What Mechanisms Does Knowledge Distillation Distill?. Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 243:60-75 Available from https://proceedings.mlr.press/v243/wu24a.html.

Related Material