A Novel Sequential Coreset Method for Gradient Descent Algorithms

Jiawei Huang, Ruomin Huang, Wenjie Liu, Nikolaos Freris, Hu Ding
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4412-4422, 2021.

Abstract

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. Coreset is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the “locality” property of gradient descent algorithms, we propose a new framework, termed “sequential coreset”, which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-huang21b, title = {A Novel Sequential Coreset Method for Gradient Descent Algorithms}, author = {Huang, Jiawei and Huang, Ruomin and Liu, Wenjie and Freris, Nikolaos and Ding, Hu}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4412--4422}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/huang21b/huang21b.pdf}, url = {https://proceedings.mlr.press/v139/huang21b.html}, abstract = {A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. Coreset is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the “locality” property of gradient descent algorithms, we propose a new framework, termed “sequential coreset”, which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.} }
Endnote
%0 Conference Paper %T A Novel Sequential Coreset Method for Gradient Descent Algorithms %A Jiawei Huang %A Ruomin Huang %A Wenjie Liu %A Nikolaos Freris %A Hu Ding %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-huang21b %I PMLR %P 4412--4422 %U https://proceedings.mlr.press/v139/huang21b.html %V 139 %X A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. Coreset is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the “locality” property of gradient descent algorithms, we propose a new framework, termed “sequential coreset”, which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.
APA
Huang, J., Huang, R., Liu, W., Freris, N. & Ding, H.. (2021). A Novel Sequential Coreset Method for Gradient Descent Algorithms. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4412-4422 Available from https://proceedings.mlr.press/v139/huang21b.html.

Related Material