A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging

Wei-Cheng Lee; Francesco Orabona

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging

Wei-Cheng Lee, Francesco Orabona

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:4588-4634, 2026.

Abstract

We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-lee26d,
  title = 	 {A Single Stepsize Suffices for Unprojected Linear {TD(0)}: Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging},
  author =       {Lee, Wei-Cheng and Orabona, Francesco},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {4588--4634},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/lee26d/lee26d.pdf},
  url = 	 {https://proceedings.mlr.press/v336/lee26d.html},
  abstract = 	 {We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.}
}

Endnote

%0 Conference Paper
%T A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging
%A Wei-Cheng Lee
%A Francesco Orabona
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-lee26d
%I PMLR
%P 4588--4634
%U https://proceedings.mlr.press/v336/lee26d.html
%V 336
%X We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain \emph{unprojected} TD(0) algorithm with Polyak–Ruppert (PR) averaging, using a \emph{single} stepsize schedule $\eta_t \propto 1/(\tau_{\mathrm{mix}}\log(t)\,\sqrt{t})$ that depends on mixing time but requires \emph{no prior knowledge of the curvature parameter $\omega$}. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded \emph{with high probability}, without projections and without any stability argument based on $\omega$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}/\sqrt{T})$ rate and a fast curvature-dependent $\widetilde{\mathcal O}(\tau_{\mathrm{mix}}^2/(\omega T))$ rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.

APA

Lee, W. & Orabona, F.. (2026). A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak–Ruppert Averaging. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:4588-4634 Available from https://proceedings.mlr.press/v336/lee26d.html.

Related Material

Download PDF