Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig
Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, PMLR 137:60-69, 2020.

Abstract

Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v137-chen20a, title = {Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering}, author = {Chen, Ricky T. Q. and Choi, Dami and Balles, Lukas and Duvenaud, David and Hennig, Philipp}, booktitle = {Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops}, pages = {60--69}, year = {2020}, editor = {Zosa Forde, Jessica and Ruiz, Francisco and Pradier, Melanie F. and Schein, Aaron}, volume = {137}, series = {Proceedings of Machine Learning Research}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v137/chen20a/chen20a.pdf}, url = {https://proceedings.mlr.press/v137/chen20a.html}, abstract = {Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.} }
Endnote
%0 Conference Paper %T Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering %A Ricky T. Q. Chen %A Dami Choi %A Lukas Balles %A David Duvenaud %A Philipp Hennig %B Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops %C Proceedings of Machine Learning Research %D 2020 %E Jessica Zosa Forde %E Francisco Ruiz %E Melanie F. Pradier %E Aaron Schein %F pmlr-v137-chen20a %I PMLR %P 60--69 %U https://proceedings.mlr.press/v137/chen20a.html %V 137 %X Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.
APA
Chen, R.T.Q., Choi, D., Balles, L., Duvenaud, D. & Hennig, P.. (2020). Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering. Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, in Proceedings of Machine Learning Research 137:60-69 Available from https://proceedings.mlr.press/v137/chen20a.html.

Related Material