DerivativeFree Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
[edit]
Proceedings of Machine Learning Research, PMLR 89:29162925, 2019.
Abstract
We study derivativefree methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, twopoint, derivativefree method for linearquadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative prefactors that are explicit lowerorder polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zeroorder rates for a class of nonconvex optimization problems.
Related Material


