Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor W. Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, Vladlen Koltun
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11710-11737, 2025.

Abstract

Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale – 1.6 billion km of driving. This is enabled by Gigaflow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. The policy is realistic when assessed against human references and achieves unprecedented robustness, averaging 17.5 years of continuous driving between incidents in simulation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cusumano-towner25a, title = {Robust Autonomy Emerges from Self-Play}, author = {Cusumano-Towner, Marco and Hafner, David and Hertzberg, Alexander and Huval, Brody and Petrenko, Aleksei and Vinitsky, Eugene and Wijmans, Erik and Killian, Taylor W. and Bowers, Stuart and Sener, Ozan and Kraehenbuehl, Philipp and Koltun, Vladlen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11710--11737}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cusumano-towner25a/cusumano-towner25a.pdf}, url = {https://proceedings.mlr.press/v267/cusumano-towner25a.html}, abstract = {Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale – 1.6 billion km of driving. This is enabled by Gigaflow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. The policy is realistic when assessed against human references and achieves unprecedented robustness, averaging 17.5 years of continuous driving between incidents in simulation.} }
Endnote
%0 Conference Paper %T Robust Autonomy Emerges from Self-Play %A Marco Cusumano-Towner %A David Hafner %A Alexander Hertzberg %A Brody Huval %A Aleksei Petrenko %A Eugene Vinitsky %A Erik Wijmans %A Taylor W. Killian %A Stuart Bowers %A Ozan Sener %A Philipp Kraehenbuehl %A Vladlen Koltun %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cusumano-towner25a %I PMLR %P 11710--11737 %U https://proceedings.mlr.press/v267/cusumano-towner25a.html %V 267 %X Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale – 1.6 billion km of driving. This is enabled by Gigaflow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. The policy is realistic when assessed against human references and achieves unprecedented robustness, averaging 17.5 years of continuous driving between incidents in simulation.
APA
Cusumano-Towner, M., Hafner, D., Hertzberg, A., Huval, B., Petrenko, A., Vinitsky, E., Wijmans, E., Killian, T.W., Bowers, S., Sener, O., Kraehenbuehl, P. & Koltun, V.. (2025). Robust Autonomy Emerges from Self-Play. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11710-11737 Available from https://proceedings.mlr.press/v267/cusumano-towner25a.html.

Related Material