Opponent Modeling in Deep Reinforcement Learning

He He; Jordan Boyd-Graber; Kevin Kwok; Hal Daumé, III

Opponent Modeling in Deep Reinforcement Learning

He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1804-1813, 2016.

Abstract

Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-he16,
  title = 	 {Opponent Modeling in Deep Reinforcement Learning},
  author =       {He, He and Boyd-Graber, Jordan and Kwok, Kevin and Daum\'e, III, Hal},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1804--1813},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/he16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/he16.html},
  abstract = 	 {Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.}
}

Endnote

%0 Conference Paper
%T Opponent Modeling in Deep Reinforcement Learning
%A He He
%A Jordan Boyd-Graber
%A Kevin Kwok
%A Hal Daumé, III
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-he16
%I PMLR
%P 1804--1813
%U https://proceedings.mlr.press/v48/he16.html
%V 48
%X Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.

RIS


TY  - CPAPER
TI  - Opponent Modeling in Deep Reinforcement Learning
AU  - He He
AU  - Jordan Boyd-Graber
AU  - Kevin Kwok
AU  - Hal Daumé, III
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-he16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1804
EP  - 1813
L1  - http://proceedings.mlr.press/v48/he16.pdf
UR  - https://proceedings.mlr.press/v48/he16.html
AB  - Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.
ER  -

APA


He, H., Boyd-Graber, J., Kwok, K. & Daumé, III, H.. (2016). Opponent Modeling in Deep Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1804-1813 Available from https://proceedings.mlr.press/v48/he16.html.

Related Material

Download PDF