The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics

Aamal Hussain, Francesco Belardinelli, Dario Paccagnan
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:14178-14202, 2023.

Abstract

Understanding the impact of exploration on the behaviour of multi-agent learning has, so far, benefited from the restriction to potential, or network zero-sum games in which convergence to an equilibrium can be shown. Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q- Learning dynamics. We show that, in any network game, exploration by agents results in the convergence of Q-Learning to a neighbourhood of an equilibrium. This holds independently of whether the dynamics reach the equilibrium or display complex behaviours. We show that increasing the exploration rate decreases the size of this neighbourhood and also decreases the ability of all agents to improve their payoffs. Furthermore, in a broad class of games, the payoff performance of Q-Learning dynamics, measured by Social Welfare, decreases when the exploration rate increases. Our experiments show this to be a general phenomenon, namely that exploration leads to improved convergence of Q-Learning, at the cost of payoff performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-hussain23a, title = {The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics}, author = {Hussain, Aamal and Belardinelli, Francesco and Paccagnan, Dario}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {14178--14202}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/hussain23a/hussain23a.pdf}, url = {https://proceedings.mlr.press/v202/hussain23a.html}, abstract = {Understanding the impact of exploration on the behaviour of multi-agent learning has, so far, benefited from the restriction to potential, or network zero-sum games in which convergence to an equilibrium can be shown. Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q- Learning dynamics. We show that, in any network game, exploration by agents results in the convergence of Q-Learning to a neighbourhood of an equilibrium. This holds independently of whether the dynamics reach the equilibrium or display complex behaviours. We show that increasing the exploration rate decreases the size of this neighbourhood and also decreases the ability of all agents to improve their payoffs. Furthermore, in a broad class of games, the payoff performance of Q-Learning dynamics, measured by Social Welfare, decreases when the exploration rate increases. Our experiments show this to be a general phenomenon, namely that exploration leads to improved convergence of Q-Learning, at the cost of payoff performance.} }
Endnote
%0 Conference Paper %T The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics %A Aamal Hussain %A Francesco Belardinelli %A Dario Paccagnan %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-hussain23a %I PMLR %P 14178--14202 %U https://proceedings.mlr.press/v202/hussain23a.html %V 202 %X Understanding the impact of exploration on the behaviour of multi-agent learning has, so far, benefited from the restriction to potential, or network zero-sum games in which convergence to an equilibrium can be shown. Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q- Learning dynamics. We show that, in any network game, exploration by agents results in the convergence of Q-Learning to a neighbourhood of an equilibrium. This holds independently of whether the dynamics reach the equilibrium or display complex behaviours. We show that increasing the exploration rate decreases the size of this neighbourhood and also decreases the ability of all agents to improve their payoffs. Furthermore, in a broad class of games, the payoff performance of Q-Learning dynamics, measured by Social Welfare, decreases when the exploration rate increases. Our experiments show this to be a general phenomenon, namely that exploration leads to improved convergence of Q-Learning, at the cost of payoff performance.
APA
Hussain, A., Belardinelli, F. & Paccagnan, D.. (2023). The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:14178-14202 Available from https://proceedings.mlr.press/v202/hussain23a.html.

Related Material