Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control

Skand Peri, Akhil Perincherry, Bikram Pandit, Stefan Lee
Proceedings of The 9th Conference on Robot Learning, PMLR 305:221-237, 2025.

Abstract

Efficient robot locomotion often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly in the reward function. This requires carefully weighting the reward terms to avoid undesirable trade-offs where energy minimization harms task success or vice versa. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to promote non-conflicting updates. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of $64$% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, and offers a principled alternative to reward shaping for energy efficient control policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-peri25a, title = {Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control}, author = {Peri, Skand and Perincherry, Akhil and Pandit, Bikram and Lee, Stefan}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {221--237}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/peri25a/peri25a.pdf}, url = {https://proceedings.mlr.press/v305/peri25a.html}, abstract = {Efficient robot locomotion often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly in the reward function. This requires carefully weighting the reward terms to avoid undesirable trade-offs where energy minimization harms task success or vice versa. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to promote non-conflicting updates. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of $64$% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, and offers a principled alternative to reward shaping for energy efficient control policies.} }
Endnote
%0 Conference Paper %T Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control %A Skand Peri %A Akhil Perincherry %A Bikram Pandit %A Stefan Lee %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-peri25a %I PMLR %P 221--237 %U https://proceedings.mlr.press/v305/peri25a.html %V 305 %X Efficient robot locomotion often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly in the reward function. This requires carefully weighting the reward terms to avoid undesirable trade-offs where energy minimization harms task success or vice versa. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to promote non-conflicting updates. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of $64$% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, and offers a principled alternative to reward shaping for energy efficient control policies.
APA
Peri, S., Perincherry, A., Pandit, B. & Lee, S.. (2025). Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:221-237 Available from https://proceedings.mlr.press/v305/peri25a.html.

Related Material