A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging

Wei-Cheng Lee, Francesco Orabona
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:4588-4634, 2026.

Abstract

We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-lee26d, title = {A Single Stepsize Suffices for Unprojected Linear {TD(0)}: Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging}, author = {Lee, Wei-Cheng and Orabona, Francesco}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {4588--4634}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/lee26d/lee26d.pdf}, url = {https://proceedings.mlr.press/v336/lee26d.html}, abstract = {We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.} }
Endnote
%0 Conference Paper %T A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging %A Wei-Cheng Lee %A Francesco Orabona %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-lee26d %I PMLR %P 4588--4634 %U https://proceedings.mlr.press/v336/lee26d.html %V 336 %X We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.
APA
Lee, W. & Orabona, F.. (2026). A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:4588-4634 Available from https://proceedings.mlr.press/v336/lee26d.html.

Related Material