HeteRSGD: Tackling Heterogeneous Sampling Costs via Optimal Reweighted Stochastic Gradient Descent

Ziang Chen, Jianfeng Lu, Huajie Qian, Xinshang Wang, Wotao Yin
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10732-10781, 2023.

Abstract

One implicit assumption in current stochastic gradient descent (SGD) algorithms is the identical cost for sampling each component function of the finite-sum objective. However, there are applications where the costs differ substantially, for which SGD schemes with uniform sampling invoke a high sampling load. We investigate the use of importance sampling (IS) as a cost saver in this setting, in contrast to its traditional use for variance reduction. The key ingredient is a novel efficiency metric for IS that advocates low sampling costs while penalizing high gradient variances. We then propose HeteRSGD, an SGD scheme that performs gradient sampling according to optimal probability weights stipulated by the metric, and establish theories on its optimal asymptotic and finite-time convergence rates among all possible IS-based SGD schemes. We show that the relative efficiency gain of HeteRSGD can be arbitrarily large regardless of the problem dimension and number of components. Our theoretical results are validated numerically for both convex and nonconvex problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-chen23i, title = {HeteRSGD: Tackling Heterogeneous Sampling Costs via Optimal Reweighted Stochastic Gradient Descent}, author = {Chen, Ziang and Lu, Jianfeng and Qian, Huajie and Wang, Xinshang and Yin, Wotao}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10732--10781}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/chen23i/chen23i.pdf}, url = {https://proceedings.mlr.press/v206/chen23i.html}, abstract = {One implicit assumption in current stochastic gradient descent (SGD) algorithms is the identical cost for sampling each component function of the finite-sum objective. However, there are applications where the costs differ substantially, for which SGD schemes with uniform sampling invoke a high sampling load. We investigate the use of importance sampling (IS) as a cost saver in this setting, in contrast to its traditional use for variance reduction. The key ingredient is a novel efficiency metric for IS that advocates low sampling costs while penalizing high gradient variances. We then propose HeteRSGD, an SGD scheme that performs gradient sampling according to optimal probability weights stipulated by the metric, and establish theories on its optimal asymptotic and finite-time convergence rates among all possible IS-based SGD schemes. We show that the relative efficiency gain of HeteRSGD can be arbitrarily large regardless of the problem dimension and number of components. Our theoretical results are validated numerically for both convex and nonconvex problems.} }
Endnote
%0 Conference Paper %T HeteRSGD: Tackling Heterogeneous Sampling Costs via Optimal Reweighted Stochastic Gradient Descent %A Ziang Chen %A Jianfeng Lu %A Huajie Qian %A Xinshang Wang %A Wotao Yin %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-chen23i %I PMLR %P 10732--10781 %U https://proceedings.mlr.press/v206/chen23i.html %V 206 %X One implicit assumption in current stochastic gradient descent (SGD) algorithms is the identical cost for sampling each component function of the finite-sum objective. However, there are applications where the costs differ substantially, for which SGD schemes with uniform sampling invoke a high sampling load. We investigate the use of importance sampling (IS) as a cost saver in this setting, in contrast to its traditional use for variance reduction. The key ingredient is a novel efficiency metric for IS that advocates low sampling costs while penalizing high gradient variances. We then propose HeteRSGD, an SGD scheme that performs gradient sampling according to optimal probability weights stipulated by the metric, and establish theories on its optimal asymptotic and finite-time convergence rates among all possible IS-based SGD schemes. We show that the relative efficiency gain of HeteRSGD can be arbitrarily large regardless of the problem dimension and number of components. Our theoretical results are validated numerically for both convex and nonconvex problems.
APA
Chen, Z., Lu, J., Qian, H., Wang, X. & Yin, W.. (2023). HeteRSGD: Tackling Heterogeneous Sampling Costs via Optimal Reweighted Stochastic Gradient Descent. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10732-10781 Available from https://proceedings.mlr.press/v206/chen23i.html.

Related Material