FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

Aman Sinha, Matthew O’Kelly, Hongrui Zheng, Rahul Mangharam, John Duchi, Russ Tedrake
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8992-9004, 2020.

Abstract

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents’ behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-sinha20a, title = {{F}ormula{Z}ero: Distributionally Robust Online Adaptation via Offline Population Synthesis}, author = {Sinha, Aman and O'Kelly, Matthew and Zheng, Hongrui and Mangharam, Rahul and Duchi, John and Tedrake, Russ}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8992--9004}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/sinha20a/sinha20a.pdf}, url = {http://proceedings.mlr.press/v119/sinha20a.html}, abstract = {Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents’ behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.} }
Endnote
%0 Conference Paper %T FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis %A Aman Sinha %A Matthew O’Kelly %A Hongrui Zheng %A Rahul Mangharam %A John Duchi %A Russ Tedrake %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-sinha20a %I PMLR %P 8992--9004 %U http://proceedings.mlr.press/v119/sinha20a.html %V 119 %X Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents’ behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.
APA
Sinha, A., O’Kelly, M., Zheng, H., Mangharam, R., Duchi, J. & Tedrake, R.. (2020). FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8992-9004 Available from http://proceedings.mlr.press/v119/sinha20a.html.

Related Material