Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

Shingo Yashima, Atsushi Nitanda, Taiji Suzuki
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1954-1962, 2021.

Abstract

Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive computationally efficient learning algorithms. We consider solving a binary classification problem using random features and stochastic gradient descent, both of which are common and widely used in practical large-scale problems. Although there are plenty of previous works investigating the efficiency of these algorithms in terms of the convergence of the objective loss function, these results suggest that the computational gain comes at expense of the learning accuracy when dealing with general Lipschitz loss functions such as logistic loss. In this study, we analyze the properties of these algorithms in terms of the convergence not of the loss function, but the classification error under the strong low-noise condition, which reflects a realistic property of real-world datasets. We extend previous studies on SGD to a random features setting, examining a novel analysis about the error induced by the approximation of random features in terms of the distance between the generated hypothesis to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. We demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems under the strong low-noise condition.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-yashima21a, title = { Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features }, author = {Yashima, Shingo and Nitanda, Atsushi and Suzuki, Taiji}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1954--1962}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/yashima21a/yashima21a.pdf}, url = {https://proceedings.mlr.press/v130/yashima21a.html}, abstract = { Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive computationally efficient learning algorithms. We consider solving a binary classification problem using random features and stochastic gradient descent, both of which are common and widely used in practical large-scale problems. Although there are plenty of previous works investigating the efficiency of these algorithms in terms of the convergence of the objective loss function, these results suggest that the computational gain comes at expense of the learning accuracy when dealing with general Lipschitz loss functions such as logistic loss. In this study, we analyze the properties of these algorithms in terms of the convergence not of the loss function, but the classification error under the strong low-noise condition, which reflects a realistic property of real-world datasets. We extend previous studies on SGD to a random features setting, examining a novel analysis about the error induced by the approximation of random features in terms of the distance between the generated hypothesis to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. We demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems under the strong low-noise condition. } }
Endnote
%0 Conference Paper %T Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features %A Shingo Yashima %A Atsushi Nitanda %A Taiji Suzuki %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-yashima21a %I PMLR %P 1954--1962 %U https://proceedings.mlr.press/v130/yashima21a.html %V 130 %X Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive computationally efficient learning algorithms. We consider solving a binary classification problem using random features and stochastic gradient descent, both of which are common and widely used in practical large-scale problems. Although there are plenty of previous works investigating the efficiency of these algorithms in terms of the convergence of the objective loss function, these results suggest that the computational gain comes at expense of the learning accuracy when dealing with general Lipschitz loss functions such as logistic loss. In this study, we analyze the properties of these algorithms in terms of the convergence not of the loss function, but the classification error under the strong low-noise condition, which reflects a realistic property of real-world datasets. We extend previous studies on SGD to a random features setting, examining a novel analysis about the error induced by the approximation of random features in terms of the distance between the generated hypothesis to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. We demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems under the strong low-noise condition.
APA
Yashima, S., Nitanda, A. & Suzuki, T.. (2021). Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1954-1962 Available from https://proceedings.mlr.press/v130/yashima21a.html.

Related Material