Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter Bartlett, Martin Wainwright
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2916-2925, 2019.

Abstract

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-malik19a, title = {Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems}, author = {Malik, Dhruv and Pananjady, Ashwin and Bhatia, Kush and Khamaru, Koulik and Bartlett, Peter and Wainwright, Martin}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {2916--2925}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/malik19a/malik19a.pdf}, url = {https://proceedings.mlr.press/v89/malik19a.html}, abstract = {We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.} }
Endnote
%0 Conference Paper %T Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems %A Dhruv Malik %A Ashwin Pananjady %A Kush Bhatia %A Koulik Khamaru %A Peter Bartlett %A Martin Wainwright %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-malik19a %I PMLR %P 2916--2925 %U https://proceedings.mlr.press/v89/malik19a.html %V 89 %X We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.
APA
Malik, D., Pananjady, A., Bhatia, K., Khamaru, K., Bartlett, P. & Wainwright, M.. (2019). Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:2916-2925 Available from https://proceedings.mlr.press/v89/malik19a.html.

Related Material