Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

Meet Vadera, Brian Jalaian, Benjamin Marlin
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:719-728, 2020.

Abstract

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-vadera20a, title = {Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks}, author = {Vadera, Meet and Jalaian, Brian and Marlin, Benjamin}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {719--728}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/vadera20a/vadera20a.pdf}, url = {https://proceedings.mlr.press/v124/vadera20a.html}, abstract = {In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.} }
Endnote
%0 Conference Paper %T Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks %A Meet Vadera %A Brian Jalaian %A Benjamin Marlin %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-vadera20a %I PMLR %P 719--728 %U https://proceedings.mlr.press/v124/vadera20a.html %V 124 %X In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.
APA
Vadera, M., Jalaian, B. & Marlin, B.. (2020). Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:719-728 Available from https://proceedings.mlr.press/v124/vadera20a.html.

Related Material