Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

Angeliki Kamoutsi, Goran Banjac, John Lygeros
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5257-5268, 2021.

Abstract

We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-kamoutsi21a, title = {Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations}, author = {Kamoutsi, Angeliki and Banjac, Goran and Lygeros, John}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5257--5268}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/kamoutsi21a/kamoutsi21a.pdf}, url = {https://proceedings.mlr.press/v139/kamoutsi21a.html}, abstract = {We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.} }
Endnote
%0 Conference Paper %T Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations %A Angeliki Kamoutsi %A Goran Banjac %A John Lygeros %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-kamoutsi21a %I PMLR %P 5257--5268 %U https://proceedings.mlr.press/v139/kamoutsi21a.html %V 139 %X We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.
APA
Kamoutsi, A., Banjac, G. & Lygeros, J.. (2021). Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5257-5268 Available from https://proceedings.mlr.press/v139/kamoutsi21a.html.

Related Material