Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions

Alexandre Defossez, Francis Bach
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:205-213, 2015.

Abstract

We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-defossez15, title = {{Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions}}, author = {Defossez, Alexandre and Bach, Francis}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {205--213}, year = {2015}, editor = {Lebanon, Guy and Vishwanathan, S. V. N.}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/defossez15.pdf}, url = {https://proceedings.mlr.press/v38/defossez15.html}, abstract = {We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.} }
Endnote
%0 Conference Paper %T Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions %A Alexandre Defossez %A Francis Bach %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-defossez15 %I PMLR %P 205--213 %U https://proceedings.mlr.press/v38/defossez15.html %V 38 %X We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.
RIS
TY - CPAPER TI - Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions AU - Alexandre Defossez AU - Francis Bach BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-defossez15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 205 EP - 213 L1 - http://proceedings.mlr.press/v38/defossez15.pdf UR - https://proceedings.mlr.press/v38/defossez15.html AB - We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements. ER -
APA
Defossez, A. & Bach, F.. (2015). Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:205-213 Available from https://proceedings.mlr.press/v38/defossez15.html.

Related Material