Trajectory-Level Experimental Design for Fast Safety Parameter Estimation of Unknown Environments by Autonomous Systems

Aneesh Raghavan, Karl Henrik Johansson
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:589-600, 2026.

Abstract

We consider the problem of exploring an unknown environment to identify safe and unsafe regions, with the objective of minimizing the number of samples required. The safety of each region is parameterized, and these parameters must be estimated. The exploration problem is formulated as maximizing the spectral gap (or equivalently, minimizing the mixing time) of the Markov chain induced by the agent’s policy and current parameter estimates. A closed-form solution to the resulting policy optimization problem is derived, leading to an adaptive exploration algorithm in which regions, once labeled as safe or unsafe, are no longer visited. We analyze the sample complexity required to complete the labeling task with high confidence, compare the proposed method against uniform random and Bayesian exploration strategies, and identify sufficient conditions under which the proposed algorithm achieves lower sample complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v331-raghavan26b, title = {Trajectory-Level Experimental Design for Fast Safety Parameter Estimation of Unknown Environments by Autonomous Systems}, author = {Raghavan, Aneesh and Johansson, Karl Henrik}, booktitle = {Proceedings of The 8th Annual Learning for Dynamics and Control Conference}, pages = {589--600}, year = {2026}, editor = {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay}, volume = {331}, series = {Proceedings of Machine Learning Research}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v331/main/assets/raghavan26b/raghavan26b.pdf}, url = {https://proceedings.mlr.press/v331/raghavan26b.html}, abstract = {We consider the problem of exploring an unknown environment to identify safe and unsafe regions, with the objective of minimizing the number of samples required. The safety of each region is parameterized, and these parameters must be estimated. The exploration problem is formulated as maximizing the spectral gap (or equivalently, minimizing the mixing time) of the Markov chain induced by the agent’s policy and current parameter estimates. A closed-form solution to the resulting policy optimization problem is derived, leading to an adaptive exploration algorithm in which regions, once labeled as safe or unsafe, are no longer visited. We analyze the sample complexity required to complete the labeling task with high confidence, compare the proposed method against uniform random and Bayesian exploration strategies, and identify sufficient conditions under which the proposed algorithm achieves lower sample complexity.} }
Endnote
%0 Conference Paper %T Trajectory-Level Experimental Design for Fast Safety Parameter Estimation of Unknown Environments by Autonomous Systems %A Aneesh Raghavan %A Karl Henrik Johansson %B Proceedings of The 8th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2026 %E Gaurav Sukhatme %E Lars Lindemann %E Stephen Tu %E Adam Wierman %E Nikolay Atanasov %F pmlr-v331-raghavan26b %I PMLR %P 589--600 %U https://proceedings.mlr.press/v331/raghavan26b.html %V 331 %X We consider the problem of exploring an unknown environment to identify safe and unsafe regions, with the objective of minimizing the number of samples required. The safety of each region is parameterized, and these parameters must be estimated. The exploration problem is formulated as maximizing the spectral gap (or equivalently, minimizing the mixing time) of the Markov chain induced by the agent’s policy and current parameter estimates. A closed-form solution to the resulting policy optimization problem is derived, leading to an adaptive exploration algorithm in which regions, once labeled as safe or unsafe, are no longer visited. We analyze the sample complexity required to complete the labeling task with high confidence, compare the proposed method against uniform random and Bayesian exploration strategies, and identify sufficient conditions under which the proposed algorithm achieves lower sample complexity.
APA
Raghavan, A. & Johansson, K.H.. (2026). Trajectory-Level Experimental Design for Fast Safety Parameter Estimation of Unknown Environments by Autonomous Systems. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:589-600 Available from https://proceedings.mlr.press/v331/raghavan26b.html.

Related Material