Energetic Natural Gradient Descent

Philip Thomas, Bruno Castro Silva, Christoph Dann, Emma Brunskill
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2887-2895, 2016.

Abstract

We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-thomasb16, title = {Energetic Natural Gradient Descent}, author = {Thomas, Philip and Silva, Bruno Castro and Dann, Christoph and Brunskill, Emma}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2887--2895}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/thomasb16.pdf}, url = {https://proceedings.mlr.press/v48/thomasb16.html}, abstract = {We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient.} }
Endnote
%0 Conference Paper %T Energetic Natural Gradient Descent %A Philip Thomas %A Bruno Castro Silva %A Christoph Dann %A Emma Brunskill %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-thomasb16 %I PMLR %P 2887--2895 %U https://proceedings.mlr.press/v48/thomasb16.html %V 48 %X We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient.
RIS
TY - CPAPER TI - Energetic Natural Gradient Descent AU - Philip Thomas AU - Bruno Castro Silva AU - Christoph Dann AU - Emma Brunskill BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-thomasb16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2887 EP - 2895 L1 - http://proceedings.mlr.press/v48/thomasb16.pdf UR - https://proceedings.mlr.press/v48/thomasb16.html AB - We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. ER -
APA
Thomas, P., Silva, B.C., Dann, C. & Brunskill, E.. (2016). Energetic Natural Gradient Descent. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2887-2895 Available from https://proceedings.mlr.press/v48/thomasb16.html.

Related Material