History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

Kaiyi Ji, Zhe Wang, Bowen Weng, Yi Zhou, Wei Zhang, Yingbin Liang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4762-4772, 2020.

Abstract

Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ji20a, title = {History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms}, author = {Ji, Kaiyi and Wang, Zhe and Weng, Bowen and Zhou, Yi and Zhang, Wei and Liang, Yingbin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {4762--4772}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ji20a/ji20a.pdf}, url = {http://proceedings.mlr.press/v119/ji20a.html}, abstract = {Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.} }
Endnote
%0 Conference Paper %T History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms %A Kaiyi Ji %A Zhe Wang %A Bowen Weng %A Yi Zhou %A Wei Zhang %A Yingbin Liang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ji20a %I PMLR %P 4762--4772 %U http://proceedings.mlr.press/v119/ji20a.html %V 119 %X Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.
APA
Ji, K., Wang, Z., Weng, B., Zhou, Y., Zhang, W. & Liang, Y.. (2020). History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4762-4772 Available from http://proceedings.mlr.press/v119/ji20a.html.

Related Material