Competing Bandits in Time Varying Matching Markets

Deepan Muthirayan, Chinmay Maheshwari, Pramod Khargonekar, Shankar Sastry
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:1020-1031, 2023.

Abstract

We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-muthirayan23a, title = {Competing Bandits in Time Varying Matching Markets}, author = {Muthirayan, Deepan and Maheshwari, Chinmay and Khargonekar, Pramod and Sastry, Shankar}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {1020--1031}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/muthirayan23a/muthirayan23a.pdf}, url = {https://proceedings.mlr.press/v211/muthirayan23a.html}, abstract = {We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.} }
Endnote
%0 Conference Paper %T Competing Bandits in Time Varying Matching Markets %A Deepan Muthirayan %A Chinmay Maheshwari %A Pramod Khargonekar %A Shankar Sastry %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-muthirayan23a %I PMLR %P 1020--1031 %U https://proceedings.mlr.press/v211/muthirayan23a.html %V 211 %X We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.
APA
Muthirayan, D., Maheshwari, C., Khargonekar, P. & Sastry, S.. (2023). Competing Bandits in Time Varying Matching Markets. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:1020-1031 Available from https://proceedings.mlr.press/v211/muthirayan23a.html.

Related Material