The Pitfalls of Imitation Learning when Actions are Continuous

Max Simchowitz, Daniel Pfrommer, Ali Jadbabaie
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:5248-5351, 2025.

Abstract

We study the problem of imitating an expert demonstrator in a discrete-time, continuous state-and-action space control system. We show that there exist stable dynamics (i.e. contracting exponentially quickly) and smooth, deterministic experts such that any smooth, deterministic imitator policy necessarily suffers error on execution that is exponentially larger, as a function of problem horizon, than the error under the distribution of expert training data. Our negative result applies to both behavior cloning and offline-RL algorithms, unless they produce highly \emph{improper} imitator policies — those which are non-smooth, non-Markovian, or which exhibit highly state-dependent stochasticity — or unless the expert trajectory distribution is sufficiently spread. We provide preliminary evidence of the benefits of these more complex policy parameterizations, explicating the benefits of today’s popular policy parameterizations in robot learning (e.g. action-chunking and diffusion-policies). We also establish a host of complementary negative and positive results for imitation in control systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-simchowitz25a, title = {The title of the paper}, author = {Simchowitz, Max and Pfrommer, Daniel and Jadbabaie, Ali}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {5248--5351}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/simchowitz25a/simchowitz25a.pdf}, url = {https://proceedings.mlr.press/v291/simchowitz25a.html}, abstract = { We study the problem of imitating an expert demonstrator in a discrete-time, continuous state-and-action space control system. We show that there exist stable dynamics (i.e. contracting exponentially quickly) and smooth, deterministic experts such that any smooth, deterministic imitator policy necessarily suffers error on execution that is exponentially larger, as a function of problem horizon, than the error under the distribution of expert training data. Our negative result applies to both behavior cloning and offline-RL algorithms, unless they produce highly \emph{improper} imitator policies — those which are non-smooth, non-Markovian, or which exhibit highly state-dependent stochasticity — or unless the expert trajectory distribution is sufficiently spread. We provide preliminary evidence of the benefits of these more complex policy parameterizations, explicating the benefits of today’s popular policy parameterizations in robot learning (e.g. action-chunking and diffusion-policies). We also establish a host of complementary negative and positive results for imitation in control systems. } }
Endnote
%0 Conference Paper %T The Pitfalls of Imitation Learning when Actions are Continuous %A Max Simchowitz %A Daniel Pfrommer %A Ali Jadbabaie %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-simchowitz25a %I PMLR %P 5248--5351 %U https://proceedings.mlr.press/v291/simchowitz25a.html %V 291 %X We study the problem of imitating an expert demonstrator in a discrete-time, continuous state-and-action space control system. We show that there exist stable dynamics (i.e. contracting exponentially quickly) and smooth, deterministic experts such that any smooth, deterministic imitator policy necessarily suffers error on execution that is exponentially larger, as a function of problem horizon, than the error under the distribution of expert training data. Our negative result applies to both behavior cloning and offline-RL algorithms, unless they produce highly \emph{improper} imitator policies — those which are non-smooth, non-Markovian, or which exhibit highly state-dependent stochasticity — or unless the expert trajectory distribution is sufficiently spread. We provide preliminary evidence of the benefits of these more complex policy parameterizations, explicating the benefits of today’s popular policy parameterizations in robot learning (e.g. action-chunking and diffusion-policies). We also establish a host of complementary negative and positive results for imitation in control systems.
APA
Simchowitz, M., Pfrommer, D. & Jadbabaie, A.. (2025). The Pitfalls of Imitation Learning when Actions are Continuous. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:5248-5351 Available from https://proceedings.mlr.press/v291/simchowitz25a.html.

Related Material