Efficient Reductions for Imitation Learning

Stephane Ross; Drew Bagnell

Efficient Reductions for Imitation Learning

Stephane Ross, Drew Bagnell

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:661-668, 2010.

Abstract

Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-ross10a,
  title = 	 {Efficient Reductions for Imitation Learning},
  author = 	 {Ross, Stephane and Bagnell, Drew},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {661--668},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/ross10a/ross10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/ross10a.html},
  abstract = 	 {Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d..  This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.}
}

Endnote

%0 Conference Paper
%T Efficient Reductions for Imitation Learning
%A Stephane Ross
%A Drew Bagnell
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-ross10a
%I PMLR
%P 661--668
%U https://proceedings.mlr.press/v9/ross10a.html
%V 9
%X Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d..  This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.

RIS


TY  - CPAPER
TI  - Efficient Reductions for Imitation Learning
AU  - Stephane Ross
AU  - Drew Bagnell
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-ross10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 661
EP  - 668
L1  - http://proceedings.mlr.press/v9/ross10a/ross10a.pdf
UR  - https://proceedings.mlr.press/v9/ross10a.html
AB  - Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d..  This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.
ER  -

APA


Ross, S. & Bagnell, D.. (2010). Efficient Reductions for Imitation Learning. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:661-668 Available from https://proceedings.mlr.press/v9/ross10a.html.

Efficient Reductions for Imitation Learning

Abstract

Cite this Paper

Related Material