Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

Meet Vadera; Brian Jalaian; Benjamin Marlin

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

Meet Vadera, Brian Jalaian, Benjamin Marlin

Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:719-728, 2020.

Abstract

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

Cite this Paper

BibTeX


@InProceedings{pmlr-v124-vadera20a,
  title = 	 {Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks},
  author =       {Vadera, Meet and Jalaian, Brian and Marlin, Benjamin},
  booktitle = 	 {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)},
  pages = 	 {719--728},
  year = 	 {2020},
  editor = 	 {Peters, Jonas and Sontag, David},
  volume = 	 {124},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--06 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v124/vadera20a/vadera20a.pdf},
  url = 	 {https://proceedings.mlr.press/v124/vadera20a.html},
  abstract = 	 {In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework.  The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest.  The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.}
}

Endnote

%0 Conference Paper
%T Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks
%A Meet Vadera
%A Brian Jalaian
%A Benjamin Marlin
%B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)
%C Proceedings of Machine Learning Research
%D 2020
%E Jonas Peters
%E David Sontag	
%F pmlr-v124-vadera20a
%I PMLR
%P 719--728
%U https://proceedings.mlr.press/v124/vadera20a.html
%V 124
%X In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework.  The proposed framework takes as input "teacher" and "student" model architectures and a general posterior expectation of interest.  The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

APA


Vadera, M., Jalaian, B. & Marlin, B.. (2020). Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:719-728 Available from https://proceedings.mlr.press/v124/vadera20a.html.

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

Abstract

Cite this Paper

Related Material