Provable Representation Learning for Imitation Learning via Bi-level Optimization

Sanjeev Arora, Simon Du, Sham Kakade, Yuping Luo, Nikunj Saunshi
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:367-376, 2020.

Abstract

A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts’ trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer" optimization tries to learn the joint representation and the “inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-arora20a, title = {Provable Representation Learning for Imitation Learning via Bi-level Optimization}, author = {Arora, Sanjeev and Du, Simon and Kakade, Sham and Luo, Yuping and Saunshi, Nikunj}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {367--376}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/arora20a/arora20a.pdf}, url = {https://proceedings.mlr.press/v119/arora20a.html}, abstract = {A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts’ trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer" optimization tries to learn the joint representation and the “inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.} }
Endnote
%0 Conference Paper %T Provable Representation Learning for Imitation Learning via Bi-level Optimization %A Sanjeev Arora %A Simon Du %A Sham Kakade %A Yuping Luo %A Nikunj Saunshi %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-arora20a %I PMLR %P 367--376 %U https://proceedings.mlr.press/v119/arora20a.html %V 119 %X A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts’ trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer" optimization tries to learn the joint representation and the “inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.
APA
Arora, S., Du, S., Kakade, S., Luo, Y. & Saunshi, N.. (2020). Provable Representation Learning for Imitation Learning via Bi-level Optimization. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:367-376 Available from https://proceedings.mlr.press/v119/arora20a.html.

Related Material