EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control

Samuel Holt, Todor Davchev, Dhruva Tirumala, Ben Moran, Atil Iscen, Antoine Laurens, Yixin Lin, Erik Frey, Markus Wulfmeier, Francesco Romano, Nicolas Heess
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:23451-23512, 2025.

Abstract

High-frequency control in continuous action and state spaces is essential for practical applications in the physical world. Directly applying end-to-end reinforcement learning to high-frequency control tasks struggles with assigning credit to actions across long temporal horizons, compounded by the difficulty of efficient exploration. The alternative, learning low-frequency policies that guide higher-frequency controllers (e.g., proportional-derivative (PD) controllers), can result in a limited total expressiveness of the combined control system, hindering overall performance. We introduce EvoControl, a novel bi-level policy learning framework for learning both a slow high-level policy (using PPO) and a fast low-level policy (using Evolution Strategies) for solving continuous control tasks. Learning with Evolution Strategies for the lower-policy allows robust learning for long horizons that crucially arise when operating at higher frequencies. This enables EvoControl to learn to control interactions at a high frequency, benefitting from more efficient exploration and credit assignment than direct high-frequency torque control without the need to hand-tune PD parameters. We empirically demonstrate that EvoControl can achieve a higher evaluation reward for continuous-control tasks compared to existing approaches, specifically excelling in tasks where high-frequency control is needed, such as those requiring safety-critical fast reactions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-holt25a, title = {{E}vo{C}ontrol: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control}, author = {Holt, Samuel and Davchev, Todor and Tirumala, Dhruva and Moran, Ben and Iscen, Atil and Laurens, Antoine and Lin, Yixin and Frey, Erik and Wulfmeier, Markus and Romano, Francesco and Heess, Nicolas}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {23451--23512}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/holt25a/holt25a.pdf}, url = {https://proceedings.mlr.press/v267/holt25a.html}, abstract = {High-frequency control in continuous action and state spaces is essential for practical applications in the physical world. Directly applying end-to-end reinforcement learning to high-frequency control tasks struggles with assigning credit to actions across long temporal horizons, compounded by the difficulty of efficient exploration. The alternative, learning low-frequency policies that guide higher-frequency controllers (e.g., proportional-derivative (PD) controllers), can result in a limited total expressiveness of the combined control system, hindering overall performance. We introduce EvoControl, a novel bi-level policy learning framework for learning both a slow high-level policy (using PPO) and a fast low-level policy (using Evolution Strategies) for solving continuous control tasks. Learning with Evolution Strategies for the lower-policy allows robust learning for long horizons that crucially arise when operating at higher frequencies. This enables EvoControl to learn to control interactions at a high frequency, benefitting from more efficient exploration and credit assignment than direct high-frequency torque control without the need to hand-tune PD parameters. We empirically demonstrate that EvoControl can achieve a higher evaluation reward for continuous-control tasks compared to existing approaches, specifically excelling in tasks where high-frequency control is needed, such as those requiring safety-critical fast reactions.} }
Endnote
%0 Conference Paper %T EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control %A Samuel Holt %A Todor Davchev %A Dhruva Tirumala %A Ben Moran %A Atil Iscen %A Antoine Laurens %A Yixin Lin %A Erik Frey %A Markus Wulfmeier %A Francesco Romano %A Nicolas Heess %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-holt25a %I PMLR %P 23451--23512 %U https://proceedings.mlr.press/v267/holt25a.html %V 267 %X High-frequency control in continuous action and state spaces is essential for practical applications in the physical world. Directly applying end-to-end reinforcement learning to high-frequency control tasks struggles with assigning credit to actions across long temporal horizons, compounded by the difficulty of efficient exploration. The alternative, learning low-frequency policies that guide higher-frequency controllers (e.g., proportional-derivative (PD) controllers), can result in a limited total expressiveness of the combined control system, hindering overall performance. We introduce EvoControl, a novel bi-level policy learning framework for learning both a slow high-level policy (using PPO) and a fast low-level policy (using Evolution Strategies) for solving continuous control tasks. Learning with Evolution Strategies for the lower-policy allows robust learning for long horizons that crucially arise when operating at higher frequencies. This enables EvoControl to learn to control interactions at a high frequency, benefitting from more efficient exploration and credit assignment than direct high-frequency torque control without the need to hand-tune PD parameters. We empirically demonstrate that EvoControl can achieve a higher evaluation reward for continuous-control tasks compared to existing approaches, specifically excelling in tasks where high-frequency control is needed, such as those requiring safety-critical fast reactions.
APA
Holt, S., Davchev, T., Tirumala, D., Moran, B., Iscen, A., Laurens, A., Lin, Y., Frey, E., Wulfmeier, M., Romano, F. & Heess, N.. (2025). EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:23451-23512 Available from https://proceedings.mlr.press/v267/holt25a.html.

Related Material