Adversarial Distillation of Bayesian Neural Network Posteriors

Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5190-5199, 2018.

Abstract

Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the first results applying MCMC-based BNNs to the aforementioned applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-wang18i, title = {Adversarial Distillation of {B}ayesian Neural Network Posteriors}, author = {Wang, Kuan-Chieh and Vicol, Paul and Lucas, James and Gu, Li and Grosse, Roger and Zemel, Richard}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5190--5199}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/wang18i/wang18i.pdf}, url = {https://proceedings.mlr.press/v80/wang18i.html}, abstract = {Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the first results applying MCMC-based BNNs to the aforementioned applications.} }
Endnote
%0 Conference Paper %T Adversarial Distillation of Bayesian Neural Network Posteriors %A Kuan-Chieh Wang %A Paul Vicol %A James Lucas %A Li Gu %A Roger Grosse %A Richard Zemel %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-wang18i %I PMLR %P 5190--5199 %U https://proceedings.mlr.press/v80/wang18i.html %V 80 %X Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the first results applying MCMC-based BNNs to the aforementioned applications.
APA
Wang, K., Vicol, P., Lucas, J., Gu, L., Grosse, R. & Zemel, R.. (2018). Adversarial Distillation of Bayesian Neural Network Posteriors. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5190-5199 Available from https://proceedings.mlr.press/v80/wang18i.html.

Related Material