Mutual Information Neural Estimation

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:531-540, 2018.

Abstract

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-belghazi18a, title = {Mutual Information Neural Estimation}, author = {Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeshwar, Sai and Ozair, Sherjil and Bengio, Yoshua and Courville, Aaron and Hjelm, Devon}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {531--540}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/belghazi18a/belghazi18a.pdf}, url = {http://proceedings.mlr.press/v80/belghazi18a.html}, abstract = {We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.} }
Endnote
%0 Conference Paper %T Mutual Information Neural Estimation %A Mohamed Ishmael Belghazi %A Aristide Baratin %A Sai Rajeshwar %A Sherjil Ozair %A Yoshua Bengio %A Aaron Courville %A Devon Hjelm %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-belghazi18a %I PMLR %P 531--540 %U http://proceedings.mlr.press/v80/belghazi18a.html %V 80 %X We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
APA
Belghazi, M.I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A. & Hjelm, D.. (2018). Mutual Information Neural Estimation. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:531-540 Available from http://proceedings.mlr.press/v80/belghazi18a.html.

Related Material