Zero-Shot Knowledge Distillation in Deep Networks

Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, Anirban Chakraborty
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4743-4751, 2019.

Abstract

Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-nayak19a, title = {Zero-Shot Knowledge Distillation in Deep Networks}, author = {Nayak, Gaurav Kumar and Mopuri, Konda Reddy and Shaj, Vaisakh and Radhakrishnan, Venkatesh Babu and Chakraborty, Anirban}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4743--4751}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/nayak19a/nayak19a.pdf}, url = {https://proceedings.mlr.press/v97/nayak19a.html}, abstract = {Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.} }
Endnote
%0 Conference Paper %T Zero-Shot Knowledge Distillation in Deep Networks %A Gaurav Kumar Nayak %A Konda Reddy Mopuri %A Vaisakh Shaj %A Venkatesh Babu Radhakrishnan %A Anirban Chakraborty %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-nayak19a %I PMLR %P 4743--4751 %U https://proceedings.mlr.press/v97/nayak19a.html %V 97 %X Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.
APA
Nayak, G.K., Mopuri, K.R., Shaj, V., Radhakrishnan, V.B. & Chakraborty, A.. (2019). Zero-Shot Knowledge Distillation in Deep Networks. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4743-4751 Available from https://proceedings.mlr.press/v97/nayak19a.html.

Related Material