Provably Efficient Imitation Learning from Observation Alone

Wen Sun, Anirudh Vemula, Byron Boots, Drew Bagnell
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6036-6045, 2019.

Abstract

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL provably learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results that typically only consider tabular RL settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy ofFAIL on multiple OpenAI Gym control tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-sun19b, title = {Provably Efficient Imitation Learning from Observation Alone}, author = {Sun, Wen and Vemula, Anirudh and Boots, Byron and Bagnell, Drew}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6036--6045}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/sun19b/sun19b.pdf}, url = {https://proceedings.mlr.press/v97/sun19b.html}, abstract = {We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL provably learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results that typically only consider tabular RL settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy ofFAIL on multiple OpenAI Gym control tasks.} }
Endnote
%0 Conference Paper %T Provably Efficient Imitation Learning from Observation Alone %A Wen Sun %A Anirudh Vemula %A Byron Boots %A Drew Bagnell %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-sun19b %I PMLR %P 6036--6045 %U https://proceedings.mlr.press/v97/sun19b.html %V 97 %X We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL provably learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results that typically only consider tabular RL settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy ofFAIL on multiple OpenAI Gym control tasks.
APA
Sun, W., Vemula, A., Boots, B. & Bagnell, D.. (2019). Provably Efficient Imitation Learning from Observation Alone. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6036-6045 Available from https://proceedings.mlr.press/v97/sun19b.html.

Related Material