Provably Efficient Adversarial Imitation Learning with Unknown Transitions

Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2367-2378, 2023.

Abstract

Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of $\widetilde{\mathcal{O}} (H^{3/2} |\mathcal{S}|/\varepsilon)$ and interaction complexity of $\widetilde{\mathcal{O}} (H^{3} |\mathcal{S}|^2 |\mathcal{A}|/\varepsilon^2)$. Here, $H$ represents the planning horizon, $|\mathcal{S}|$ is the state space size, $|\mathcal{A}|$ is the action space size, and $\varepsilon$ is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by $\mathcal{O} (H)$. Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of $|\mathcal{S}|$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-xu23c, title = {Provably Efficient Adversarial Imitation Learning with Unknown Transitions}, author = {Xu, Tian and Li, Ziniu and Yu, Yang and Luo, Zhi-Quan}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2367--2378}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/xu23c/xu23c.pdf}, url = {https://proceedings.mlr.press/v216/xu23c.html}, abstract = {Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of $\widetilde{\mathcal{O}} (H^{3/2} |\mathcal{S}|/\varepsilon)$ and interaction complexity of $\widetilde{\mathcal{O}} (H^{3} |\mathcal{S}|^2 |\mathcal{A}|/\varepsilon^2)$. Here, $H$ represents the planning horizon, $|\mathcal{S}|$ is the state space size, $|\mathcal{A}|$ is the action space size, and $\varepsilon$ is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by $\mathcal{O} (H)$. Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of $|\mathcal{S}|$.} }
Endnote
%0 Conference Paper %T Provably Efficient Adversarial Imitation Learning with Unknown Transitions %A Tian Xu %A Ziniu Li %A Yang Yu %A Zhi-Quan Luo %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-xu23c %I PMLR %P 2367--2378 %U https://proceedings.mlr.press/v216/xu23c.html %V 216 %X Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of $\widetilde{\mathcal{O}} (H^{3/2} |\mathcal{S}|/\varepsilon)$ and interaction complexity of $\widetilde{\mathcal{O}} (H^{3} |\mathcal{S}|^2 |\mathcal{A}|/\varepsilon^2)$. Here, $H$ represents the planning horizon, $|\mathcal{S}|$ is the state space size, $|\mathcal{A}|$ is the action space size, and $\varepsilon$ is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by $\mathcal{O} (H)$. Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of $|\mathcal{S}|$.
APA
Xu, T., Li, Z., Yu, Y. & Luo, Z.. (2023). Provably Efficient Adversarial Imitation Learning with Unknown Transitions. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2367-2378 Available from https://proceedings.mlr.press/v216/xu23c.html.

Related Material