Cross-domain Imitation from Observations

Dripta S. Raychaudhuri, Sujoy Paul, Jeroen Vanbaar, Amit K. Roy-Chowdhury
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8902-8912, 2021.

Abstract

Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when discrepancies exist between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-raychaudhuri21a, title = {Cross-domain Imitation from Observations}, author = {Raychaudhuri, Dripta S. and Paul, Sujoy and Vanbaar, Jeroen and Roy-Chowdhury, Amit K.}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8902--8912}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/raychaudhuri21a/raychaudhuri21a.pdf}, url = {https://proceedings.mlr.press/v139/raychaudhuri21a.html}, abstract = {Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when discrepancies exist between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.} }
Endnote
%0 Conference Paper %T Cross-domain Imitation from Observations %A Dripta S. Raychaudhuri %A Sujoy Paul %A Jeroen Vanbaar %A Amit K. Roy-Chowdhury %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-raychaudhuri21a %I PMLR %P 8902--8912 %U https://proceedings.mlr.press/v139/raychaudhuri21a.html %V 139 %X Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when discrepancies exist between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.
APA
Raychaudhuri, D.S., Paul, S., Vanbaar, J. & Roy-Chowdhury, A.K.. (2021). Cross-domain Imitation from Observations. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8902-8912 Available from https://proceedings.mlr.press/v139/raychaudhuri21a.html.

Related Material