Knowledge Transfer with Jacobian Matching

Suraj Srinivas, Francois Fleuret
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4723-4731, 2018.

Abstract

Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-srinivas18a, title = {Knowledge Transfer with {J}acobian Matching}, author = {Srinivas, Suraj and Fleuret, Francois}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4723--4731}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/srinivas18a/srinivas18a.pdf}, url = {https://proceedings.mlr.press/v80/srinivas18a.html}, abstract = {Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.} }
Endnote
%0 Conference Paper %T Knowledge Transfer with Jacobian Matching %A Suraj Srinivas %A Francois Fleuret %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-srinivas18a %I PMLR %P 4723--4731 %U https://proceedings.mlr.press/v80/srinivas18a.html %V 80 %X Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.
APA
Srinivas, S. & Fleuret, F.. (2018). Knowledge Transfer with Jacobian Matching. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4723-4731 Available from https://proceedings.mlr.press/v80/srinivas18a.html.

Related Material