Nonasymptotic Analysis of Biased Stochastic Approximation Scheme
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:19441974, 2019.
Abstract
Stochastic approximation (SA) is a key method used in statistical learning. Recently, its nonasymptotic convergence analysis has been considered in many papers. However, most of the prior analyses are made under restrictive assumptions such as unbiased gradient estimates and convex objective function, which significantly limit their applications to sophisticated tasks such as online and reinforcement learning. These restrictions are all essentially relaxed in this work. In particular, we analyze a general SA scheme to minimize a nonconvex, smooth objective function. We consider update procedure whose drift term depends on a statedependent Markov chain and the mean field is not necessarily of gradient type, covering approximate secondorder method and allowing asymptotic bias for the onestep updates. We illustrate these settings with the online EM algorithm and the policygradient method for average reward maximization in reinforcement learning.
Related Material


