Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

Zakaria Mhammedi, Dylan J Foster, Alexander Rakhlin
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24659-24700, 2023.

Abstract

We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the current action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis of MusIK leverages several new techniques tailored to non-optimistic algorithms for reward-free exploration, which we anticipate will find broader use.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-mhammedi23a, title = {Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation {RL}}, author = {Mhammedi, Zakaria and Foster, Dylan J and Rakhlin, Alexander}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {24659--24700}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/mhammedi23a/mhammedi23a.pdf}, url = {https://proceedings.mlr.press/v202/mhammedi23a.html}, abstract = {We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the current action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis of MusIK leverages several new techniques tailored to non-optimistic algorithms for reward-free exploration, which we anticipate will find broader use.} }
Endnote
%0 Conference Paper %T Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL %A Zakaria Mhammedi %A Dylan J Foster %A Alexander Rakhlin %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-mhammedi23a %I PMLR %P 24659--24700 %U https://proceedings.mlr.press/v202/mhammedi23a.html %V 202 %X We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the current action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis of MusIK leverages several new techniques tailored to non-optimistic algorithms for reward-free exploration, which we anticipate will find broader use.
APA
Mhammedi, Z., Foster, D.J. & Rakhlin, A.. (2023). Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:24659-24700 Available from https://proceedings.mlr.press/v202/mhammedi23a.html.

Related Material