Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

Xiyang Wu; Rohan Chandra; Tianrui Guan; Amrit Bedi; Dinesh Manocha

Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Bedi, Dinesh Manocha

Proceedings of The 7th Conference on Robot Learning, PMLR 229:446-477, 2023.

Abstract

Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm for joint trajectory and intent prediction for autonomous vehicles in dense and heterogeneous environments. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers’ intents solely from their local observations. We model an explicit representation of agents’ private incentives: Behavioral Incentive for high-level decision-making strategy that sets planning sub-goals and Instant Incentive for low-level motion planning to execute sub-goals. Our approach enables agents to infer their opponents’ behavior incentives and integrate this inferred information into their decision-making and motion-planning processes. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized training decentralized execution (CTDE) MARL baselines such as QMIX and MAPPO, our method yields a $4.3%$ and $38.4%$ higher episodic reward in mild and chaotic traffic, with $48.1%$ higher success rate and $80.6%$ longer survival time in chaotic traffic. We also compare with a decentralized training decentralized execution (DTDE) baseline IPPO and demonstrate a higher episodic reward of $12.7%$ and $6.3%$ in mild traffic and chaotic traffic, $25.3%$ higher success rate, and $13.7%$ longer survival time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v229-wu23b,
  title = 	 {Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning},
  author =       {Wu, Xiyang and Chandra, Rohan and Guan, Tianrui and Bedi, Amrit and Manocha, Dinesh},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {446--477},
  year = 	 {2023},
  editor = 	 {Tan, Jie and Toussaint, Marc and Darvish, Kourosh},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v229/wu23b/wu23b.pdf},
  url = 	 {https://proceedings.mlr.press/v229/wu23b.html},
  abstract = 	 {Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm for joint trajectory and intent prediction for autonomous vehicles in dense and heterogeneous environments. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers’ intents solely from their local observations. We model an explicit representation of agents’ private incentives: Behavioral Incentive for high-level decision-making strategy that sets planning sub-goals and Instant Incentive for low-level motion planning to execute sub-goals. Our approach enables agents to infer their opponents’ behavior incentives and integrate this inferred information into their decision-making and motion-planning processes. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized training decentralized execution (CTDE) MARL baselines such as QMIX and MAPPO, our method yields a $4.3%$ and $38.4%$ higher episodic reward in mild and chaotic traffic, with $48.1%$ higher success rate and $80.6%$ longer survival time in chaotic traffic. We also compare with a decentralized training decentralized execution (DTDE) baseline IPPO and demonstrate a higher episodic reward of $12.7%$ and $6.3%$ in mild traffic and chaotic traffic, $25.3%$ higher success rate, and $13.7%$ longer survival time.}
}

Endnote

%0 Conference Paper
%T Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning
%A Xiyang Wu
%A Rohan Chandra
%A Tianrui Guan
%A Amrit Bedi
%A Dinesh Manocha
%B Proceedings of The 7th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Jie Tan
%E Marc Toussaint
%E Kourosh Darvish	
%F pmlr-v229-wu23b
%I PMLR
%P 446--477
%U https://proceedings.mlr.press/v229/wu23b.html
%V 229
%X Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm for joint trajectory and intent prediction for autonomous vehicles in dense and heterogeneous environments. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers’ intents solely from their local observations. We model an explicit representation of agents’ private incentives: Behavioral Incentive for high-level decision-making strategy that sets planning sub-goals and Instant Incentive for low-level motion planning to execute sub-goals. Our approach enables agents to infer their opponents’ behavior incentives and integrate this inferred information into their decision-making and motion-planning processes. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized training decentralized execution (CTDE) MARL baselines such as QMIX and MAPPO, our method yields a $4.3%$ and $38.4%$ higher episodic reward in mild and chaotic traffic, with $48.1%$ higher success rate and $80.6%$ longer survival time in chaotic traffic. We also compare with a decentralized training decentralized execution (DTDE) baseline IPPO and demonstrate a higher episodic reward of $12.7%$ and $6.3%$ in mild traffic and chaotic traffic, $25.3%$ higher success rate, and $13.7%$ longer survival time.

APA


Wu, X., Chandra, R., Guan, T., Bedi, A. & Manocha, D.. (2023). Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:446-477 Available from https://proceedings.mlr.press/v229/wu23b.html.

Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

Abstract

Cite this Paper

Related Material