Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff, Brian Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, Siddhartha Srinivasa
Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:424-431, 2009.

Abstract

Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v5-ratliff09a, title = {Inverse Optimal Heuristic Control for Imitation Learning}, author = {Ratliff, Nathan and Ziebart, Brian and Peterson, Kevin and Bagnell, J. Andrew and Hebert, Martial and Dey, Anind K. and Srinivasa, Siddhartha}, booktitle = {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics}, pages = {424--431}, year = {2009}, editor = {van Dyk, David and Welling, Max}, volume = {5}, series = {Proceedings of Machine Learning Research}, address = {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v5/ratliff09a/ratliff09a.pdf}, url = {https://proceedings.mlr.press/v5/ratliff09a.html}, abstract = {Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.} }
Endnote
%0 Conference Paper %T Inverse Optimal Heuristic Control for Imitation Learning %A Nathan Ratliff %A Brian Ziebart %A Kevin Peterson %A J. Andrew Bagnell %A Martial Hebert %A Anind K. Dey %A Siddhartha Srinivasa %B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2009 %E David van Dyk %E Max Welling %F pmlr-v5-ratliff09a %I PMLR %P 424--431 %U https://proceedings.mlr.press/v5/ratliff09a.html %V 5 %X Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.
RIS
TY - CPAPER TI - Inverse Optimal Heuristic Control for Imitation Learning AU - Nathan Ratliff AU - Brian Ziebart AU - Kevin Peterson AU - J. Andrew Bagnell AU - Martial Hebert AU - Anind K. Dey AU - Siddhartha Srinivasa BT - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics DA - 2009/04/15 ED - David van Dyk ED - Max Welling ID - pmlr-v5-ratliff09a PB - PMLR DP - Proceedings of Machine Learning Research VL - 5 SP - 424 EP - 431 L1 - http://proceedings.mlr.press/v5/ratliff09a/ratliff09a.pdf UR - https://proceedings.mlr.press/v5/ratliff09a.html AB - Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment. ER -
APA
Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J.A., Hebert, M., Dey, A.K. & Srinivasa, S.. (2009). Inverse Optimal Heuristic Control for Imitation Learning. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:424-431 Available from https://proceedings.mlr.press/v5/ratliff09a.html.

Related Material