[edit]
Inverse Optimal Heuristic Control for Imitation Learning
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:424-431, 2009.
Abstract
Imitation learning is an increasingly important tool for both developing automatic decision making systems as well as for learning to predict decision-making and behavior by observation. Two basic approaches are common: the first, which we here term behavioral cloning (BC)\citeBehavioralCloning,ALVINN,DAVE, treats the imitation learning problem as a straightforward one of supervised learning (e.g. classification) where the goal is to map observations to controls. Secondly, the notion of inverse optimal control (IOC) \citeBoydIOC,ng00irl,Abbeel04c,mmp06 for modeling such decision making behavior has gained prominence as it allows for learned decision-making that reasons sequentially and over a long horizon. Unfortunately, such inverse optimal control methods rely upon the ability to efficiently solve a planning problem and suffer from the usual “curse of dimensionality” when the state space gets large. This paper presents a novel approach to imitation learning that we call Inverse Optimal Heuristic Control (IOHC) which capitalizes on the strengths of both paradigms by allowing long-horizon, planning style reasoning in a low dimensional space, while enabling a high dimensional additional set of features to guide overall action selection. We frame this combined problem as one of optimization, and although the resulting objective function is actually non-convex, we are able to provide convex upper and lower bounds to optimize as surrogates. Further, these bounds, as well as our empirical results, show that the objective function is nearly convex and leads to improved performance on a set of imitation learning problems including turn prediction of drivers as well as predicting the likely paths taken by pedestrians in an office environment.