A Coupled Flow Approach to Imitation Learning

Gideon Joseph Freund, Elad Sarafian, Sarit Kraus
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10357-10372, 2023.

Abstract

In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it–along with the related state-action distribution–can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-freund23a, title = {A Coupled Flow Approach to Imitation Learning}, author = {Freund, Gideon Joseph and Sarafian, Elad and Kraus, Sarit}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10357--10372}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/freund23a/freund23a.pdf}, url = {https://proceedings.mlr.press/v202/freund23a.html}, abstract = {In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it–along with the related state-action distribution–can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.} }
Endnote
%0 Conference Paper %T A Coupled Flow Approach to Imitation Learning %A Gideon Joseph Freund %A Elad Sarafian %A Sarit Kraus %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-freund23a %I PMLR %P 10357--10372 %U https://proceedings.mlr.press/v202/freund23a.html %V 202 %X In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it–along with the related state-action distribution–can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.
APA
Freund, G.J., Sarafian, E. & Kraus, S.. (2023). A Coupled Flow Approach to Imitation Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10357-10372 Available from https://proceedings.mlr.press/v202/freund23a.html.

Related Material