[edit]
Decentralized Two-Sided Bandit Learning in Matching Market
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:4173-4191, 2024.
Abstract
Two-sided matching under uncertainty has recently drawn much attention due to its wide applications. Existing works in matching bandits mainly focus on the one-sided learning setting and design algorithms with the objective of converging to stable matching with low regret. In this paper, we consider the more general two-sided learning setting, i.e. participants on both sides have to learn their preferences over the other side through repeated interactions. Inspired by the classical result that the optimal matching for the proposing side can be obtained using the Gale-Shapley algorithm, our inquiry stems from the curiosity about whether this result still holds in a two-sided learning setting. To handle this question, we formally introduce the two-sided learning setting, addressing strategies for both the arm and player sides without restrictive assumptions such as special preference structure and observation of winning players. Our results not only provide a positive answer to our inquiry but also offer a near-optimal upper bound, achieving $O(\log T)$ regret.