Causal Imitation Learning under Temporally Correlated Noise

Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20877-20890, 2022.

Abstract

We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-swamy22a, title = {Causal Imitation Learning under Temporally Correlated Noise}, author = {Swamy, Gokul and Choudhury, Sanjiban and Bagnell, Drew and Wu, Steven}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20877--20890}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/swamy22a/swamy22a.pdf}, url = {https://proceedings.mlr.press/v162/swamy22a.html}, abstract = {We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.} }
Endnote
%0 Conference Paper %T Causal Imitation Learning under Temporally Correlated Noise %A Gokul Swamy %A Sanjiban Choudhury %A Drew Bagnell %A Steven Wu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-swamy22a %I PMLR %P 20877--20890 %U https://proceedings.mlr.press/v162/swamy22a.html %V 162 %X We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.
APA
Swamy, G., Choudhury, S., Bagnell, D. & Wu, S.. (2022). Causal Imitation Learning under Temporally Correlated Noise. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20877-20890 Available from https://proceedings.mlr.press/v162/swamy22a.html.

Related Material