Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions

Alexandre Defossez; Francis Bach

Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions

Alexandre Defossez, Francis Bach

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:205-213, 2015.

Abstract

We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-defossez15,
  title = 	 {{Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions}},
  author = 	 {Defossez, Alexandre and Bach, Francis},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {205--213},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/defossez15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/defossez15.html},
  abstract = 	 {We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead  to significant improvements.}
}

Endnote

%0 Conference Paper
%T Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions
%A Alexandre Defossez
%A Francis Bach
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-defossez15
%I PMLR
%P 205--213
%U https://proceedings.mlr.press/v38/defossez15.html
%V 38
%X We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead  to significant improvements.

RIS


TY  - CPAPER
TI  - Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions
AU  - Alexandre Defossez
AU  - Francis Bach
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-defossez15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 205
EP  - 213
L1  - http://proceedings.mlr.press/v38/defossez15.pdf
UR  - https://proceedings.mlr.press/v38/defossez15.html
AB  - We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead  to significant improvements.
ER  -

APA


Defossez, A. & Bach, F.. (2015). Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:205-213 Available from https://proceedings.mlr.press/v38/defossez15.html.

Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions

Abstract

Cite this Paper

Related Material