RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator

Hemant Kumawat, Biswadeep Chakraborty, Saibal Mukhopadhyay
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3474-3499, 2025.

Abstract

Developing agents that can perform complex control tasks from high-dimensional observations is a core ability of autonomous agents that requires underlying robust task control policies and adapting the underlying visual representations to the task. Most existing policies need a lot of training samples and treat this problem from the lens of two-stage learning with a controller learned on top of pre-trained vision models. We approach this problem from the lens of Koopman theory and learn visual representations from robotic agents conditioned on specific downstream tasks in the context of learning stabilizing control for the agent. We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent’s visual data in a high dimensional latent space and utilizes reinforcement learning to perform off-policy control on top of the extracted representations with a linear controller. Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches by improving efficiency and accuracy in learning task policies over extended horizons.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-kumawat25a, title = {RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator}, author = {Kumawat, Hemant and Chakraborty, Biswadeep and Mukhopadhyay, Saibal}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3474--3499}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/kumawat25a/kumawat25a.pdf}, url = {https://proceedings.mlr.press/v270/kumawat25a.html}, abstract = {Developing agents that can perform complex control tasks from high-dimensional observations is a core ability of autonomous agents that requires underlying robust task control policies and adapting the underlying visual representations to the task. Most existing policies need a lot of training samples and treat this problem from the lens of two-stage learning with a controller learned on top of pre-trained vision models. We approach this problem from the lens of Koopman theory and learn visual representations from robotic agents conditioned on specific downstream tasks in the context of learning stabilizing control for the agent. We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent’s visual data in a high dimensional latent space and utilizes reinforcement learning to perform off-policy control on top of the extracted representations with a linear controller. Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches by improving efficiency and accuracy in learning task policies over extended horizons.} }
Endnote
%0 Conference Paper %T RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator %A Hemant Kumawat %A Biswadeep Chakraborty %A Saibal Mukhopadhyay %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-kumawat25a %I PMLR %P 3474--3499 %U https://proceedings.mlr.press/v270/kumawat25a.html %V 270 %X Developing agents that can perform complex control tasks from high-dimensional observations is a core ability of autonomous agents that requires underlying robust task control policies and adapting the underlying visual representations to the task. Most existing policies need a lot of training samples and treat this problem from the lens of two-stage learning with a controller learned on top of pre-trained vision models. We approach this problem from the lens of Koopman theory and learn visual representations from robotic agents conditioned on specific downstream tasks in the context of learning stabilizing control for the agent. We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent’s visual data in a high dimensional latent space and utilizes reinforcement learning to perform off-policy control on top of the extracted representations with a linear controller. Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches by improving efficiency and accuracy in learning task policies over extended horizons.
APA
Kumawat, H., Chakraborty, B. & Mukhopadhyay, S.. (2025). RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3474-3499 Available from https://proceedings.mlr.press/v270/kumawat25a.html.

Related Material