Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou; Ashwin Pananjady; Martin Wainwright; Peter Bartlett

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou, Ashwin Pananjady, Martin Wainwright, Peter Bartlett

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:2060-2061, 2022.

Abstract

We study stochastic approximation procedures for approximately solving a

$d$ -dimensional linear fixed point equation based on observing a trajectory of length

$n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order

$t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where

$t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters

$(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD(

$\lambda$ ) family of algorithms for all

$\lambda \in [0, 1)$ —and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of

$\lambda$ when running the TD(

$\lambda$ ) algorithm).

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-mou22a,
  title = 	 {Optimal and instance-dependent guarantees for Markovian linear stochastic approximation},
  author =       {Mou, Wenlong and Pananjady, Ashwin and Wainwright, Martin and Bartlett, Peter},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {2060--2061},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/mou22a/mou22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/mou22a.html},
  abstract = 	 {We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain.  We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time.  We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$—and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).}
}

Endnote

%0 Conference Paper
%T Optimal and instance-dependent guarantees for Markovian linear stochastic approximation
%A Wenlong Mou
%A Ashwin Pananjady
%A Martin Wainwright
%A Peter Bartlett
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-mou22a
%I PMLR
%P 2060--2061
%U https://proceedings.mlr.press/v178/mou22a.html
%V 178
%X We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain.  We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time.  We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$—and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).

APA


Mou, W., Pananjady, A., Wainwright, M. & Bartlett, P.. (2022). Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:2060-2061 Available from https://proceedings.mlr.press/v178/mou22a.html.

Related Material

Download PDF