Pathwise Derivatives Beyond the Reparameterization Trick

Martin Jankowiak, Fritz Obermeyer
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2235-2244, 2018.

Abstract

We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-jankowiak18a, title = {Pathwise Derivatives Beyond the Reparameterization Trick}, author = {Jankowiak, Martin and Obermeyer, Fritz}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2235--2244}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/jankowiak18a/jankowiak18a.pdf}, url = {https://proceedings.mlr.press/v80/jankowiak18a.html}, abstract = {We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.} }
Endnote
%0 Conference Paper %T Pathwise Derivatives Beyond the Reparameterization Trick %A Martin Jankowiak %A Fritz Obermeyer %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-jankowiak18a %I PMLR %P 2235--2244 %U https://proceedings.mlr.press/v80/jankowiak18a.html %V 80 %X We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.
APA
Jankowiak, M. & Obermeyer, F.. (2018). Pathwise Derivatives Beyond the Reparameterization Trick. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2235-2244 Available from https://proceedings.mlr.press/v80/jankowiak18a.html.

Related Material