Regret, stability & fairness in matching markets with bandit learners

Sarah H. Cen; Devavrat Shah

Regret, stability & fairness in matching markets with bandit learners

Sarah H. Cen, Devavrat Shah

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:8938-8968, 2022.

Abstract

Making an informed decision—for example, when choosing a career or housing—requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition—namely, costs and transfers—we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-cen22a,
  title = 	 { Regret, stability & fairness in matching markets with bandit learners },
  author =       {Cen, Sarah H. and Shah, Devavrat},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {8938--8968},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/cen22a/cen22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/cen22a.html},
  abstract = 	 { Making an informed decision—for example, when choosing a career or housing—requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition—namely, costs and transfers—we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare. }
}

Endnote

%0 Conference Paper
%T  Regret, stability & fairness in matching markets with bandit learners 
%A Sarah H. Cen
%A Devavrat Shah
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-cen22a
%I PMLR
%P 8938--8968
%U https://proceedings.mlr.press/v151/cen22a.html
%V 151
%X  Making an informed decision—for example, when choosing a career or housing—requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition—namely, costs and transfers—we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.

APA


Cen, S.H. & Shah, D.. (2022).  Regret, stability & fairness in matching markets with bandit learners . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:8938-8968 Available from https://proceedings.mlr.press/v151/cen22a.html.

Related Material

Download PDF