Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

Abishek Sankararaman, Soumya Basu, Karthik Abinav Sankararaman
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1252-1260, 2021.

Abstract

Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm - UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete dominated arms – the arms preferred by higher ranked agents, and play only from the non-dominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through pure exploitation. We prove a new regret lower bound for the decentralized serial dictatorship model, and prove that UCB-D3 achieves order optimal regret guarantee.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-sankararaman21a, title = { Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship }, author = {Sankararaman, Abishek and Basu, Soumya and Abinav Sankararaman, Karthik}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1252--1260}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/sankararaman21a/sankararaman21a.pdf}, url = {https://proceedings.mlr.press/v130/sankararaman21a.html}, abstract = { Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm - UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete dominated arms – the arms preferred by higher ranked agents, and play only from the non-dominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through pure exploitation. We prove a new regret lower bound for the decentralized serial dictatorship model, and prove that UCB-D3 achieves order optimal regret guarantee. } }
Endnote
%0 Conference Paper %T Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship %A Abishek Sankararaman %A Soumya Basu %A Karthik Abinav Sankararaman %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-sankararaman21a %I PMLR %P 1252--1260 %U https://proceedings.mlr.press/v130/sankararaman21a.html %V 130 %X Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm - UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete dominated arms – the arms preferred by higher ranked agents, and play only from the non-dominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through pure exploitation. We prove a new regret lower bound for the decentralized serial dictatorship model, and prove that UCB-D3 achieves order optimal regret guarantee.
APA
Sankararaman, A., Basu, S. & Abinav Sankararaman, K.. (2021). Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1252-1260 Available from https://proceedings.mlr.press/v130/sankararaman21a.html.

Related Material