Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2815-2825, 2021.

Abstract

We study the problem of minimizing a relatively-smooth convex function using stochastic Bregman gradient methods. We first prove the convergence of Bregman Stochastic Gradient Descent (BSGD) to a region that depends on the noise (magnitude of the gradients) at the optimum. In particular, BSGD quickly converges to the exact minimizer when this noise is zero (interpolation setting, in which the data is fit perfectly). Otherwise, when the objective has a finite sum structure, we show that variance reduction can be used to counter the effect of noise. In particular, fast convergence to the exact minimizer can be obtained under additional regularity assumptions on the Bregman reference function. We illustrate the effectiveness of our approach on two key applications of relative smoothness: tomographic reconstruction with Poisson noise and statistical preconditioning for distributed optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-dragomir21a, title = {Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction}, author = {Dragomir, Radu Alexandru and Even, Mathieu and Hendrikx, Hadrien}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2815--2825}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/dragomir21a/dragomir21a.pdf}, url = {https://proceedings.mlr.press/v139/dragomir21a.html}, abstract = {We study the problem of minimizing a relatively-smooth convex function using stochastic Bregman gradient methods. We first prove the convergence of Bregman Stochastic Gradient Descent (BSGD) to a region that depends on the noise (magnitude of the gradients) at the optimum. In particular, BSGD quickly converges to the exact minimizer when this noise is zero (interpolation setting, in which the data is fit perfectly). Otherwise, when the objective has a finite sum structure, we show that variance reduction can be used to counter the effect of noise. In particular, fast convergence to the exact minimizer can be obtained under additional regularity assumptions on the Bregman reference function. We illustrate the effectiveness of our approach on two key applications of relative smoothness: tomographic reconstruction with Poisson noise and statistical preconditioning for distributed optimization.} }
Endnote
%0 Conference Paper %T Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction %A Radu Alexandru Dragomir %A Mathieu Even %A Hadrien Hendrikx %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-dragomir21a %I PMLR %P 2815--2825 %U https://proceedings.mlr.press/v139/dragomir21a.html %V 139 %X We study the problem of minimizing a relatively-smooth convex function using stochastic Bregman gradient methods. We first prove the convergence of Bregman Stochastic Gradient Descent (BSGD) to a region that depends on the noise (magnitude of the gradients) at the optimum. In particular, BSGD quickly converges to the exact minimizer when this noise is zero (interpolation setting, in which the data is fit perfectly). Otherwise, when the objective has a finite sum structure, we show that variance reduction can be used to counter the effect of noise. In particular, fast convergence to the exact minimizer can be obtained under additional regularity assumptions on the Bregman reference function. We illustrate the effectiveness of our approach on two key applications of relative smoothness: tomographic reconstruction with Poisson noise and statistical preconditioning for distributed optimization.
APA
Dragomir, R.A., Even, M. & Hendrikx, H.. (2021). Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2815-2825 Available from https://proceedings.mlr.press/v139/dragomir21a.html.

Related Material