Fast Probabilistic Optimization from Noisy Gradients

Philipp Hennig

Fast Probabilistic Optimization from Noisy Gradients

Philipp Hennig

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):62-70, 2013.

Abstract

Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-hennig13,
  title = 	 {Fast Probabilistic Optimization from Noisy Gradients},
  author = 	 {Hennig, Philipp},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {62--70},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/hennig13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/hennig13.html},
  abstract = 	 {Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.}
}

Endnote

%0 Conference Paper
%T Fast Probabilistic Optimization from Noisy Gradients
%A Philipp Hennig
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-hennig13
%I PMLR
%P 62--70
%U https://proceedings.mlr.press/v28/hennig13.html
%V 28
%N 1
%X Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.

RIS


TY  - CPAPER
TI  - Fast Probabilistic Optimization from Noisy Gradients
AU  - Philipp Hennig
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/02/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-hennig13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 1
SP  - 62
EP  - 70
L1  - http://proceedings.mlr.press/v28/hennig13.pdf
UR  - https://proceedings.mlr.press/v28/hennig13.html
AB  - Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.
ER  -

APA


Hennig, P.. (2013). Fast Probabilistic Optimization from Noisy Gradients. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(1):62-70 Available from https://proceedings.mlr.press/v28/hennig13.html.

Related Material

Download PDF