Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System

Zipeng Dai; Tianze Zhou; Kun Shao; David Henry Mguni; Bin Wang; Jianye HAO

Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System

Zipeng Dai, Tianze Zhou, Kun Shao, David Henry Mguni, Bin Wang, Jianye HAO

Proceedings of The 6th Conference on Robot Learning, PMLR 205:946-955, 2023.

Abstract

As increasing numbers of autonomous vehicles (AVs) are being deployed, it is important to construct a multi-agent self-driving (MASD) system for navigating traffic flows of AVs. In an MASD system, AVs not only navigate themselves to pursue their own goals, but also interact with each other to prevent congestion or collision, especially in scenarios like intersection or lane merging. Multi-agent reinforcement learning (MARL) provides an appealing alternative to generate safe and efficient actions for multiple AVs. However, current MARL methods are limited to describe scenarios where agents interact in either a cooperative of competitive fashion within one episode. Ordinarily, the agents’ objectives are defined with a global or team reward function, which fail to deal with the dynamic social preferences (SPs) and mixed motives like human drivers in traffic interactions. To this end, we propose a novel MARL method called Socially-Attentive Policy Optimization (SAPO), which incorporates: (a) a self-attention module to select the most interactive traffic participant for each AV, and (b) a social-aware integration mechanism to integrate objectives of interacting AVs by estimating the dynamic social preferences from their observations. SAPO solves the problem of how to improve the safety and efficiency of MASD systems, by enabling AVs to learn socially-compatible behaviors. Simulation experiments show that SAPO can successfully capture and utilize the variation of the SPs of AVs to achieve superior performance, compared with baselines in MASD scenarios.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-dai23a,
  title = 	 {Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System},
  author =       {Dai, Zipeng and Zhou, Tianze and Shao, Kun and Mguni, David Henry and Wang, Bin and HAO, Jianye},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {946--955},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/dai23a/dai23a.pdf},
  url = 	 {https://proceedings.mlr.press/v205/dai23a.html},
  abstract = 	 {As increasing numbers of autonomous vehicles (AVs) are being deployed, it is important to construct a multi-agent self-driving (MASD) system for navigating traffic flows of AVs. In an MASD system, AVs not only navigate themselves to pursue their own goals, but also interact with each other to prevent congestion or collision, especially in scenarios like intersection or lane merging. Multi-agent reinforcement learning (MARL) provides an appealing alternative to generate safe and efficient actions for multiple AVs. However, current MARL methods are limited to describe scenarios where agents interact in either a cooperative of competitive fashion within one episode. Ordinarily, the agents’ objectives are defined with a global or team reward function, which fail to deal with the dynamic social preferences (SPs) and mixed motives like human drivers in traffic interactions. To this end, we propose a novel MARL method called Socially-Attentive Policy Optimization (SAPO), which incorporates: (a) a self-attention module to select the most interactive traffic participant for each AV, and (b) a social-aware integration mechanism to integrate objectives of interacting AVs by estimating the dynamic social preferences from their observations. SAPO solves the problem of how to improve the safety and efficiency of MASD systems, by enabling AVs to learn socially-compatible behaviors. Simulation experiments show that SAPO can successfully capture and utilize the variation of the SPs of AVs to achieve superior performance, compared with baselines in MASD scenarios.}
}

Endnote

%0 Conference Paper
%T Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System
%A Zipeng Dai
%A Tianze Zhou
%A Kun Shao
%A David Henry Mguni
%A Bin Wang
%A Jianye HAO
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-dai23a
%I PMLR
%P 946--955
%U https://proceedings.mlr.press/v205/dai23a.html
%V 205
%X As increasing numbers of autonomous vehicles (AVs) are being deployed, it is important to construct a multi-agent self-driving (MASD) system for navigating traffic flows of AVs. In an MASD system, AVs not only navigate themselves to pursue their own goals, but also interact with each other to prevent congestion or collision, especially in scenarios like intersection or lane merging. Multi-agent reinforcement learning (MARL) provides an appealing alternative to generate safe and efficient actions for multiple AVs. However, current MARL methods are limited to describe scenarios where agents interact in either a cooperative of competitive fashion within one episode. Ordinarily, the agents’ objectives are defined with a global or team reward function, which fail to deal with the dynamic social preferences (SPs) and mixed motives like human drivers in traffic interactions. To this end, we propose a novel MARL method called Socially-Attentive Policy Optimization (SAPO), which incorporates: (a) a self-attention module to select the most interactive traffic participant for each AV, and (b) a social-aware integration mechanism to integrate objectives of interacting AVs by estimating the dynamic social preferences from their observations. SAPO solves the problem of how to improve the safety and efficiency of MASD systems, by enabling AVs to learn socially-compatible behaviors. Simulation experiments show that SAPO can successfully capture and utilize the variation of the SPs of AVs to achieve superior performance, compared with baselines in MASD scenarios.

APA


Dai, Z., Zhou, T., Shao, K., Mguni, D.H., Wang, B. & HAO, J.. (2023). Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:946-955 Available from https://proceedings.mlr.press/v205/dai23a.html.

Socially-Attentive Policy Optimization in Multi-Agent Self-Driving System

Abstract

Cite this Paper

Related Material