Fast Probabilistic Optimization from Noisy Gradients

Philipp Hennig
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):62-70, 2013.

Abstract

Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-hennig13, title = {Fast Probabilistic Optimization from Noisy Gradients}, author = {Hennig, Philipp}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {62--70}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/hennig13.pdf}, url = {https://proceedings.mlr.press/v28/hennig13.html}, abstract = {Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.} }
Endnote
%0 Conference Paper %T Fast Probabilistic Optimization from Noisy Gradients %A Philipp Hennig %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-hennig13 %I PMLR %P 62--70 %U https://proceedings.mlr.press/v28/hennig13.html %V 28 %N 1 %X Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.
RIS
TY - CPAPER TI - Fast Probabilistic Optimization from Noisy Gradients AU - Philipp Hennig BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-hennig13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 1 SP - 62 EP - 70 L1 - http://proceedings.mlr.press/v28/hennig13.pdf UR - https://proceedings.mlr.press/v28/hennig13.html AB - Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions. ER -
APA
Hennig, P.. (2013). Fast Probabilistic Optimization from Noisy Gradients. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(1):62-70 Available from https://proceedings.mlr.press/v28/hennig13.html.

Related Material