Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Yecheng Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:14639-14663, 2022.

Abstract

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-ma22a, title = {Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching}, author = {Ma, Yecheng and Shen, Andrew and Jayaraman, Dinesh and Bastani, Osbert}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {14639--14663}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/ma22a/ma22a.pdf}, url = {https://proceedings.mlr.press/v162/ma22a.html}, abstract = {We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.} }
Endnote
%0 Conference Paper %T Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching %A Yecheng Ma %A Andrew Shen %A Dinesh Jayaraman %A Osbert Bastani %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-ma22a %I PMLR %P 14639--14663 %U https://proceedings.mlr.press/v162/ma22a.html %V 162 %X We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.
APA
Ma, Y., Shen, A., Jayaraman, D. & Bastani, O.. (2022). Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:14639-14663 Available from https://proceedings.mlr.press/v162/ma22a.html.

Related Material