IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

Stefano Viel, Luca Viano, Volkan Cevher
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:61444-61479, 2025.

Abstract

This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template which learns a policy from expert demonstrations with a primal-dual style algorithm which alternates cost and policy updates. Within the policy updates the SOAR framework prescribe to use an actor critic method with multiple critics to estimate the critic uncertainty and therefore build an optimistic critic fundamental to drive exploration. When instantiated to the tabular setting, we get a provable algorithms dubbed FRA with guarantees matching the best known results in $\epsilon$. Practically, the SOAR template is shown to boost consistently the performance of primal dual IL algorithms building on actor critic routines for the policy updates. Approximately, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by a half.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-viel25a, title = {{IL}-{SOAR} : Imitation Learning with Soft Optimistic Actor c{R}itic}, author = {Viel, Stefano and Viano, Luca and Cevher, Volkan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {61444--61479}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/viel25a/viel25a.pdf}, url = {https://proceedings.mlr.press/v267/viel25a.html}, abstract = {This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template which learns a policy from expert demonstrations with a primal-dual style algorithm which alternates cost and policy updates. Within the policy updates the SOAR framework prescribe to use an actor critic method with multiple critics to estimate the critic uncertainty and therefore build an optimistic critic fundamental to drive exploration. When instantiated to the tabular setting, we get a provable algorithms dubbed FRA with guarantees matching the best known results in $\epsilon$. Practically, the SOAR template is shown to boost consistently the performance of primal dual IL algorithms building on actor critic routines for the policy updates. Approximately, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by a half.} }
Endnote
%0 Conference Paper %T IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic %A Stefano Viel %A Luca Viano %A Volkan Cevher %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-viel25a %I PMLR %P 61444--61479 %U https://proceedings.mlr.press/v267/viel25a.html %V 267 %X This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template which learns a policy from expert demonstrations with a primal-dual style algorithm which alternates cost and policy updates. Within the policy updates the SOAR framework prescribe to use an actor critic method with multiple critics to estimate the critic uncertainty and therefore build an optimistic critic fundamental to drive exploration. When instantiated to the tabular setting, we get a provable algorithms dubbed FRA with guarantees matching the best known results in $\epsilon$. Practically, the SOAR template is shown to boost consistently the performance of primal dual IL algorithms building on actor critic routines for the policy updates. Approximately, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by a half.
APA
Viel, S., Viano, L. & Cevher, V.. (2025). IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:61444-61479 Available from https://proceedings.mlr.press/v267/viel25a.html.

Related Material