Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Gal Dalal; Gugan Thoppe; Balázs Szörényi; Shie Mannor

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Gal Dalal, Gugan Thoppe, Balázs Szörényi, Shie Mannor

Proceedings of the 31st Conference On Learning Theory, PMLR 75:1199-1233, 2018.

Abstract

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as “lock-in probability”. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-dalal18a,
  title = 	 {Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning},
  author =       {Dalal, Gal and Thoppe, Gugan and Sz{\"o}r{\'e}nyi, Bal{\'a}zs and Mannor, Shie},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {1199--1233},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/dalal18a/dalal18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/dalal18a.html},
  abstract = 	 {Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as “lock-in probability”. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection.  As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.}
}

Endnote

%0 Conference Paper
%T Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning
%A Gal Dalal
%A Gugan Thoppe
%A Balázs Szörényi
%A Shie Mannor
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-dalal18a
%I PMLR
%P 1199--1233
%U https://proceedings.mlr.press/v75/dalal18a.html
%V 75
%X Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as “lock-in probability”. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection.  As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.

APA


Dalal, G., Thoppe, G., Szörényi, B. & Mannor, S.. (2018). Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1199-1233 Available from https://proceedings.mlr.press/v75/dalal18a.html.

Related Material

Download PDF