Not All Errors Are Made Equal: A Regret Metric for Detecting System-level Trajectory Prediction Failures

Kensuke Nakamura, Thomas Tian, Andrea Bajcsy
Proceedings of The 8th Conference on Robot Learning, PMLR 270:4051-4065, 2025.

Abstract

Robot decision-making increasingly relies on data-driven human prediction models when operating around people. While these models are known to mispredict in out-of-distribution interactions, only a subset of prediction errors impact downstream robot performance. We propose characterizing such “system-level” prediction failures via the mathematical notion of regret: high-regret interactions are precisely those in which mispredictions degraded closed-loop robot performance. We further introduce a probabilistic generalization of regret that calibrates failure detection across disparate deployment contexts and renders regret compatible with reward-based and reward-free (e.g., generative) planners. In simulated autonomous driving interactions, we showcase that our system-level failure metric can automatically mine for closed-loop human-robot interactions that state-of-the-art generative human predictors and robot planners struggle with. We further find that the very presence of high-regret data during human predictor fine-tuning is highly predictive of robot re-deployment performance improvements. Furthermore, fine-tuning with the informative but significantly smaller high-regret data (23% of deployment data) is competitive with fine-tuning on the full deployment dataset, indicating a promising avenue for efficiently mitigating system-level human-robot interaction failures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-nakamura25a, title = {Not All Errors Are Made Equal: A Regret Metric for Detecting System-level Trajectory Prediction Failures}, author = {Nakamura, Kensuke and Tian, Thomas and Bajcsy, Andrea}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {4051--4065}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/nakamura25a/nakamura25a.pdf}, url = {https://proceedings.mlr.press/v270/nakamura25a.html}, abstract = {Robot decision-making increasingly relies on data-driven human prediction models when operating around people. While these models are known to mispredict in out-of-distribution interactions, only a subset of prediction errors impact downstream robot performance. We propose characterizing such “system-level” prediction failures via the mathematical notion of regret: high-regret interactions are precisely those in which mispredictions degraded closed-loop robot performance. We further introduce a probabilistic generalization of regret that calibrates failure detection across disparate deployment contexts and renders regret compatible with reward-based and reward-free (e.g., generative) planners. In simulated autonomous driving interactions, we showcase that our system-level failure metric can automatically mine for closed-loop human-robot interactions that state-of-the-art generative human predictors and robot planners struggle with. We further find that the very presence of high-regret data during human predictor fine-tuning is highly predictive of robot re-deployment performance improvements. Furthermore, fine-tuning with the informative but significantly smaller high-regret data (23% of deployment data) is competitive with fine-tuning on the full deployment dataset, indicating a promising avenue for efficiently mitigating system-level human-robot interaction failures.} }
Endnote
%0 Conference Paper %T Not All Errors Are Made Equal: A Regret Metric for Detecting System-level Trajectory Prediction Failures %A Kensuke Nakamura %A Thomas Tian %A Andrea Bajcsy %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-nakamura25a %I PMLR %P 4051--4065 %U https://proceedings.mlr.press/v270/nakamura25a.html %V 270 %X Robot decision-making increasingly relies on data-driven human prediction models when operating around people. While these models are known to mispredict in out-of-distribution interactions, only a subset of prediction errors impact downstream robot performance. We propose characterizing such “system-level” prediction failures via the mathematical notion of regret: high-regret interactions are precisely those in which mispredictions degraded closed-loop robot performance. We further introduce a probabilistic generalization of regret that calibrates failure detection across disparate deployment contexts and renders regret compatible with reward-based and reward-free (e.g., generative) planners. In simulated autonomous driving interactions, we showcase that our system-level failure metric can automatically mine for closed-loop human-robot interactions that state-of-the-art generative human predictors and robot planners struggle with. We further find that the very presence of high-regret data during human predictor fine-tuning is highly predictive of robot re-deployment performance improvements. Furthermore, fine-tuning with the informative but significantly smaller high-regret data (23% of deployment data) is competitive with fine-tuning on the full deployment dataset, indicating a promising avenue for efficiently mitigating system-level human-robot interaction failures.
APA
Nakamura, K., Tian, T. & Bajcsy, A.. (2025). Not All Errors Are Made Equal: A Regret Metric for Detecting System-level Trajectory Prediction Failures. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:4051-4065 Available from https://proceedings.mlr.press/v270/nakamura25a.html.

Related Material