LeastSquares Temporal Difference Learning for the Linear Quadratic Regulator
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:50055014, 2018.
Abstract
Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the LeastSquares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finitetime analysis of the number of samples needed to estimate the value function for a fixed static statefeedback policy to within epsilonrelative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fastmixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that our analysis correctly captures the qualitative behavior of LSTD on several LQR instances.
Related Material


