Stochastic linear optimization never overfits with quadratically-bounded losses on general data

Matus Telgarsky
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:5453-5488, 2022.

Abstract

This work provides test error bounds for iterative fixed point methods on linear predictors — specifically, stochastic and batch mirror descent (MD), and stochastic temporal difference learning (TD) — with two core contributions: (a) a single proof technique which gives high probability guarantees despite the absence of projections, regularization, or any equivalents, even when optima have large or infinite norm, for quadratically-bounded losses (e.g., providing unified treatment of squared and logistic losses); (b) locally-adapted rates which depend not on global problem structure (such as conditions numbers and maximum margins), but rather on properties of low norm predictors which may suffer some small excess test error. The proof technique is an elementary and versatile coupling argument, and is demonstrated here in the following settings: stochastic MD under realizability; stochastic MD for general Markov data; batch MD for general IID data; stochastic MD on heavy-tailed data (still without projections); stochastic TD on approximately mixing Markov chains (all prior stochastic TD bounds are in expectation).

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-telgarsky22a, title = {Stochastic linear optimization never overfits with quadratically-bounded losses on general data}, author = {Telgarsky, Matus}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {5453--5488}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/telgarsky22a/telgarsky22a.pdf}, url = {https://proceedings.mlr.press/v178/telgarsky22a.html}, abstract = {This work provides test error bounds for iterative fixed point methods on linear predictors — specifically, stochastic and batch mirror descent (MD), and stochastic temporal difference learning (TD) — with two core contributions: (a) a single proof technique which gives high probability guarantees despite the absence of projections, regularization, or any equivalents, even when optima have large or infinite norm, for quadratically-bounded losses (e.g., providing unified treatment of squared and logistic losses); (b) locally-adapted rates which depend not on global problem structure (such as conditions numbers and maximum margins), but rather on properties of low norm predictors which may suffer some small excess test error. The proof technique is an elementary and versatile coupling argument, and is demonstrated here in the following settings: stochastic MD under realizability; stochastic MD for general Markov data; batch MD for general IID data; stochastic MD on heavy-tailed data (still without projections); stochastic TD on approximately mixing Markov chains (all prior stochastic TD bounds are in expectation).} }
Endnote
%0 Conference Paper %T Stochastic linear optimization never overfits with quadratically-bounded losses on general data %A Matus Telgarsky %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-telgarsky22a %I PMLR %P 5453--5488 %U https://proceedings.mlr.press/v178/telgarsky22a.html %V 178 %X This work provides test error bounds for iterative fixed point methods on linear predictors — specifically, stochastic and batch mirror descent (MD), and stochastic temporal difference learning (TD) — with two core contributions: (a) a single proof technique which gives high probability guarantees despite the absence of projections, regularization, or any equivalents, even when optima have large or infinite norm, for quadratically-bounded losses (e.g., providing unified treatment of squared and logistic losses); (b) locally-adapted rates which depend not on global problem structure (such as conditions numbers and maximum margins), but rather on properties of low norm predictors which may suffer some small excess test error. The proof technique is an elementary and versatile coupling argument, and is demonstrated here in the following settings: stochastic MD under realizability; stochastic MD for general Markov data; batch MD for general IID data; stochastic MD on heavy-tailed data (still without projections); stochastic TD on approximately mixing Markov chains (all prior stochastic TD bounds are in expectation).
APA
Telgarsky, M.. (2022). Stochastic linear optimization never overfits with quadratically-bounded losses on general data. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:5453-5488 Available from https://proceedings.mlr.press/v178/telgarsky22a.html.

Related Material