Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff; Brian Ziebart; Kevin Peterson; J. Andrew Bagnell; Martial Hebert; Anind K. Dey; Siddhartha Srinivasa

Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff, Brian Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, Siddhartha Srinivasa

Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:424-431, 2009.

Abstract

Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-ratliff09a,
  title = 	 {Inverse Optimal Heuristic Control for Imitation Learning},
  author = 	 {Ratliff, Nathan and Ziebart, Brian and Peterson, Kevin and Bagnell, J. Andrew and Hebert, Martial and Dey, Anind K. and Srinivasa, Siddhartha},
  booktitle = 	 {Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {424--431},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/ratliff09a/ratliff09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/ratliff09a.html},
  abstract = 	 {Imitation learning is an increasingly important tool for both developing  automatic decision making systems as well as for learning to predict  decision-making and behavior by observation. Two basic approaches are common:  the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem  as a straightforward one of supervised learning (e.g. classification) where the  goal is to map observations to controls.  Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for  modeling   such decision making behavior has gained prominence as it allows for learned   decision-making that reasons sequentially and over a long horizon.  Unfortunately, such inverse optimal control methods rely upon the ability to   efficiently solve a planning problem and suffer from the usual “curse of  dimensionality” when the state space gets large. This paper presents a novel  approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by  allowing long-horizon, planning style reasoning in a low dimensional space,  while enabling a high dimensional additional set of features to guide overall  action selection.  We frame this combined problem as one of optimization, and  although the resulting objective function is actually non-convex, we are able  to provide convex upper and lower bounds to optimize as surrogates. Further,  these bounds, as well as our empirical results, show that the objective   function is nearly convex and leads to improved performance on a set of  imitation learning problems including turn prediction of drivers as well as  predicting the likely paths taken by pedestrians in an office environment.}
}

Endnote

%0 Conference Paper
%T Inverse Optimal Heuristic Control for Imitation Learning
%A Nathan Ratliff
%A Brian Ziebart
%A Kevin Peterson
%A J. Andrew Bagnell
%A Martial Hebert
%A Anind K. Dey
%A Siddhartha Srinivasa
%B Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-ratliff09a
%I PMLR
%P 424--431
%U https://proceedings.mlr.press/v5/ratliff09a.html
%V 5
%X Imitation learning is an increasingly important tool for both developing  automatic decision making systems as well as for learning to predict  decision-making and behavior by observation. Two basic approaches are common:  the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem  as a straightforward one of supervised learning (e.g. classification) where the  goal is to map observations to controls.  Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for  modeling   such decision making behavior has gained prominence as it allows for learned   decision-making that reasons sequentially and over a long horizon.  Unfortunately, such inverse optimal control methods rely upon the ability to   efficiently solve a planning problem and suffer from the usual “curse of  dimensionality” when the state space gets large. This paper presents a novel  approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by  allowing long-horizon, planning style reasoning in a low dimensional space,  while enabling a high dimensional additional set of features to guide overall  action selection.  We frame this combined problem as one of optimization, and  although the resulting objective function is actually non-convex, we are able  to provide convex upper and lower bounds to optimize as surrogates. Further,  these bounds, as well as our empirical results, show that the objective   function is nearly convex and leads to improved performance on a set of  imitation learning problems including turn prediction of drivers as well as  predicting the likely paths taken by pedestrians in an office environment.

RIS


TY  - CPAPER
TI  - Inverse Optimal Heuristic Control for Imitation Learning
AU  - Nathan Ratliff
AU  - Brian Ziebart
AU  - Kevin Peterson
AU  - J. Andrew Bagnell
AU  - Martial Hebert
AU  - Anind K. Dey
AU  - Siddhartha Srinivasa
BT  - Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-ratliff09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 424
EP  - 431
L1  - http://proceedings.mlr.press/v5/ratliff09a/ratliff09a.pdf
UR  - https://proceedings.mlr.press/v5/ratliff09a.html
AB  - Imitation learning is an increasingly important tool for both developing  automatic decision making systems as well as for learning to predict  decision-making and behavior by observation. Two basic approaches are common:  the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem  as a straightforward one of supervised learning (e.g. classification) where the  goal is to map observations to controls.  Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for  modeling   such decision making behavior has gained prominence as it allows for learned   decision-making that reasons sequentially and over a long horizon.  Unfortunately, such inverse optimal control methods rely upon the ability to   efficiently solve a planning problem and suffer from the usual “curse of  dimensionality” when the state space gets large. This paper presents a novel  approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by  allowing long-horizon, planning style reasoning in a low dimensional space,  while enabling a high dimensional additional set of features to guide overall  action selection.  We frame this combined problem as one of optimization, and  although the resulting objective function is actually non-convex, we are able  to provide convex upper and lower bounds to optimize as surrogates. Further,  these bounds, as well as our empirical results, show that the objective   function is nearly convex and leads to improved performance on a set of  imitation learning problems including turn prediction of drivers as well as  predicting the likely paths taken by pedestrians in an office environment.
ER  -

APA


Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J.A., Hebert, M., Dey, A.K. & Srinivasa, S.. (2009). Inverse Optimal Heuristic Control for Imitation Learning. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:424-431 Available from https://proceedings.mlr.press/v5/ratliff09a.html.

Related Material

Download PDF