Can Gaussian Sketching Converge Faster on a Preconditioned Landscape?

Yilong Wang, Haishan Ye, Guang Dai, Ivor Tsang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:52064-52082, 2024.

Abstract

This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24ch, title = {Can {G}aussian Sketching Converge Faster on a Preconditioned Landscape?}, author = {Wang, Yilong and Ye, Haishan and Dai, Guang and Tsang, Ivor}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {52064--52082}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24ch/wang24ch.pdf}, url = {https://proceedings.mlr.press/v235/wang24ch.html}, abstract = {This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm.} }
Endnote
%0 Conference Paper %T Can Gaussian Sketching Converge Faster on a Preconditioned Landscape? %A Yilong Wang %A Haishan Ye %A Guang Dai %A Ivor Tsang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24ch %I PMLR %P 52064--52082 %U https://proceedings.mlr.press/v235/wang24ch.html %V 235 %X This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm.
APA
Wang, Y., Ye, H., Dai, G. & Tsang, I.. (2024). Can Gaussian Sketching Converge Faster on a Preconditioned Landscape?. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:52064-52082 Available from https://proceedings.mlr.press/v235/wang24ch.html.

Related Material