Learning Safe Control via On-the-Fly Bandit Exploration

Alexandre Capone, Ryan Kazuo Cosner, Aaron Ames, Sandra Hirche
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:6726-6745, 2025.

Abstract

Control tasks with safety requirements under high levels of model uncertainty are increasingly common. Machine learning techniques are frequently used to address such tasks, typically by leveraging model error bounds to specify robust constraint-based safety filters. However, if the learned model uncertainty is very high, the corresponding filters are potentially invalid, meaning no control input satisfies the constraints imposed by the safety filter. While most works address this issue by assuming some form of safe backup controller, ours tackles it by collecting additional data on the fly using a Gaussian process bandit-type algorithm. We combine a control barrier function with a learned model to specify a robust certificate that ensures safety if feasible. Whenever infeasibility occurs, we leverage the control barrier function to guide exploration, ensuring the collected data contributes toward the closed-loop system safety. By combining a safety filter with exploration in this manner, our method provably achieves safety in a general setting that does not require any prior model or backup controller, provided that the true system lies in a reproducing kernel Hilbert space. To the best of our knowledge, it is the first safe learning-based control method that achieves this.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-capone25a, title = {Learning Safe Control via On-the-Fly Bandit Exploration}, author = {Capone, Alexandre and Cosner, Ryan Kazuo and Ames, Aaron and Hirche, Sandra}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {6726--6745}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/capone25a/capone25a.pdf}, url = {https://proceedings.mlr.press/v267/capone25a.html}, abstract = {Control tasks with safety requirements under high levels of model uncertainty are increasingly common. Machine learning techniques are frequently used to address such tasks, typically by leveraging model error bounds to specify robust constraint-based safety filters. However, if the learned model uncertainty is very high, the corresponding filters are potentially invalid, meaning no control input satisfies the constraints imposed by the safety filter. While most works address this issue by assuming some form of safe backup controller, ours tackles it by collecting additional data on the fly using a Gaussian process bandit-type algorithm. We combine a control barrier function with a learned model to specify a robust certificate that ensures safety if feasible. Whenever infeasibility occurs, we leverage the control barrier function to guide exploration, ensuring the collected data contributes toward the closed-loop system safety. By combining a safety filter with exploration in this manner, our method provably achieves safety in a general setting that does not require any prior model or backup controller, provided that the true system lies in a reproducing kernel Hilbert space. To the best of our knowledge, it is the first safe learning-based control method that achieves this.} }
Endnote
%0 Conference Paper %T Learning Safe Control via On-the-Fly Bandit Exploration %A Alexandre Capone %A Ryan Kazuo Cosner %A Aaron Ames %A Sandra Hirche %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-capone25a %I PMLR %P 6726--6745 %U https://proceedings.mlr.press/v267/capone25a.html %V 267 %X Control tasks with safety requirements under high levels of model uncertainty are increasingly common. Machine learning techniques are frequently used to address such tasks, typically by leveraging model error bounds to specify robust constraint-based safety filters. However, if the learned model uncertainty is very high, the corresponding filters are potentially invalid, meaning no control input satisfies the constraints imposed by the safety filter. While most works address this issue by assuming some form of safe backup controller, ours tackles it by collecting additional data on the fly using a Gaussian process bandit-type algorithm. We combine a control barrier function with a learned model to specify a robust certificate that ensures safety if feasible. Whenever infeasibility occurs, we leverage the control barrier function to guide exploration, ensuring the collected data contributes toward the closed-loop system safety. By combining a safety filter with exploration in this manner, our method provably achieves safety in a general setting that does not require any prior model or backup controller, provided that the true system lies in a reproducing kernel Hilbert space. To the best of our knowledge, it is the first safe learning-based control method that achieves this.
APA
Capone, A., Cosner, R.K., Ames, A. & Hirche, S.. (2025). Learning Safe Control via On-the-Fly Bandit Exploration. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:6726-6745 Available from https://proceedings.mlr.press/v267/capone25a.html.

Related Material