Language-Guided Traffic Simulation via Scene-Level Diffusion

Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray
Proceedings of The 7th Conference on Robot Learning, PMLR 229:144-177, 2023.

Abstract

Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-zhong23a, title = {Language-Guided Traffic Simulation via Scene-Level Diffusion}, author = {Zhong, Ziyuan and Rempe, Davis and Chen, Yuxiao and Ivanovic, Boris and Cao, Yulong and Xu, Danfei and Pavone, Marco and Ray, Baishakhi}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {144--177}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/zhong23a/zhong23a.pdf}, url = {https://proceedings.mlr.press/v229/zhong23a.html}, abstract = {Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.} }
Endnote
%0 Conference Paper %T Language-Guided Traffic Simulation via Scene-Level Diffusion %A Ziyuan Zhong %A Davis Rempe %A Yuxiao Chen %A Boris Ivanovic %A Yulong Cao %A Danfei Xu %A Marco Pavone %A Baishakhi Ray %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-zhong23a %I PMLR %P 144--177 %U https://proceedings.mlr.press/v229/zhong23a.html %V 229 %X Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.
APA
Zhong, Z., Rempe, D., Chen, Y., Ivanovic, B., Cao, Y., Xu, D., Pavone, M. & Ray, B.. (2023). Language-Guided Traffic Simulation via Scene-Level Diffusion. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:144-177 Available from https://proceedings.mlr.press/v229/zhong23a.html.

Related Material