Opponent Modeling in Deep Reinforcement Learning


He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III ;
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1804-1813, 2016.


Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.

Related Material