Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms

Yichen Li, Chicheng Zhang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:29278-29323, 2024.

Abstract

We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations, aiming to learn a policy that has performance competitive with the expert, using as few annotations as possible. We focus on the general agnostic setting where the expert demonstration policy may not be contained in the policy class used by the learner. We propose a new oracle-efficient algorithm MFTPL-P (abbreviation for Mixed Follow the Perturbed Leader with Poisson perturbations) with provable finite-sample guarantees, under the assumption that the learner is given access to samples from some “explorative” distribution over states. Our guarantees hold for any policy class, which is considerably broader than prior state of the art. We further propose Bootstrap-DAgger, a more practical variant that does not require additional sample access.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-li24ck, title = {Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms}, author = {Li, Yichen and Zhang, Chicheng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {29278--29323}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24ck/li24ck.pdf}, url = {https://proceedings.mlr.press/v235/li24ck.html}, abstract = {We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations, aiming to learn a policy that has performance competitive with the expert, using as few annotations as possible. We focus on the general agnostic setting where the expert demonstration policy may not be contained in the policy class used by the learner. We propose a new oracle-efficient algorithm MFTPL-P (abbreviation for Mixed Follow the Perturbed Leader with Poisson perturbations) with provable finite-sample guarantees, under the assumption that the learner is given access to samples from some “explorative” distribution over states. Our guarantees hold for any policy class, which is considerably broader than prior state of the art. We further propose Bootstrap-DAgger, a more practical variant that does not require additional sample access.} }
Endnote
%0 Conference Paper %T Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms %A Yichen Li %A Chicheng Zhang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-li24ck %I PMLR %P 29278--29323 %U https://proceedings.mlr.press/v235/li24ck.html %V 235 %X We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations, aiming to learn a policy that has performance competitive with the expert, using as few annotations as possible. We focus on the general agnostic setting where the expert demonstration policy may not be contained in the policy class used by the learner. We propose a new oracle-efficient algorithm MFTPL-P (abbreviation for Mixed Follow the Perturbed Leader with Poisson perturbations) with provable finite-sample guarantees, under the assumption that the learner is given access to samples from some “explorative” distribution over states. Our guarantees hold for any policy class, which is considerably broader than prior state of the art. We further propose Bootstrap-DAgger, a more practical variant that does not require additional sample access.
APA
Li, Y. & Zhang, C.. (2024). Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:29278-29323 Available from https://proceedings.mlr.press/v235/li24ck.html.

Related Material