Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:205-213, 2015.
We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.