[edit]
IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:61444-61479, 2025.
Abstract
This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template which learns a policy from expert demonstrations with a primal-dual style algorithm which alternates cost and policy updates. Within the policy updates the SOAR framework prescribe to use an actor critic method with multiple critics to estimate the critic uncertainty and therefore build an optimistic critic fundamental to drive exploration. When instantiated to the tabular setting, we get a provable algorithms dubbed FRA with guarantees matching the best known results in $\epsilon$. Practically, the SOAR template is shown to boost consistently the performance of primal dual IL algorithms building on actor critic routines for the policy updates. Approximately, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by a half.