Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

Atsushi Nitanda, Taiji Suzuki
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1417-1426, 2019.

Abstract

We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the square loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-nitanda19a, title = {Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors}, author = {Nitanda, Atsushi and Suzuki, Taiji}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1417--1426}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/nitanda19a/nitanda19a.pdf}, url = {https://proceedings.mlr.press/v89/nitanda19a.html}, abstract = {We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the square loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.} }
Endnote
%0 Conference Paper %T Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors %A Atsushi Nitanda %A Taiji Suzuki %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-nitanda19a %I PMLR %P 1417--1426 %U https://proceedings.mlr.press/v89/nitanda19a.html %V 89 %X We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the square loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.
APA
Nitanda, A. & Suzuki, T.. (2019). Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1417-1426 Available from https://proceedings.mlr.press/v89/nitanda19a.html.

Related Material