Learning Fair Division from Bandit Feedback

Hakuei Yamada; Junpei Komiyama; Kenshi Abe; Atsushi Iwasaki

Learning Fair Division from Bandit Feedback

Hakuei Yamada, Junpei Komiyama, Kenshi Abe, Atsushi Iwasaki

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3106-3114, 2024.

Abstract

This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents’ values or utilities. Departing from conventional online algorithms, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing dual averaging, enabling gradual learning of both the type distribution of arriving items and agents’ values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We also empirically verify the performance of the proposed algorithms across synthetic and empirical datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-yamada24a,
  title = 	 {Learning Fair Division from Bandit Feedback},
  author =       {Yamada, Hakuei and Komiyama, Junpei and Abe, Kenshi and Iwasaki, Atsushi},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3106--3114},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/yamada24a/yamada24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/yamada24a.html},
  abstract = 	 {This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents’ values or utilities. Departing from conventional online algorithms, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing dual averaging, enabling gradual learning of both the type distribution of arriving items and agents’ values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We also empirically verify the performance of the proposed algorithms across synthetic and empirical datasets.}
}

Endnote

%0 Conference Paper
%T Learning Fair Division from Bandit Feedback
%A Hakuei Yamada
%A Junpei Komiyama
%A Kenshi Abe
%A Atsushi Iwasaki
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-yamada24a
%I PMLR
%P 3106--3114
%U https://proceedings.mlr.press/v238/yamada24a.html
%V 238
%X This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents’ values or utilities. Departing from conventional online algorithms, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing dual averaging, enabling gradual learning of both the type distribution of arriving items and agents’ values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We also empirically verify the performance of the proposed algorithms across synthetic and empirical datasets.

APA

Yamada, H., Komiyama, J., Abe, K. & Iwasaki, A.. (2024). Learning Fair Division from Bandit Feedback. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3106-3114 Available from https://proceedings.mlr.press/v238/yamada24a.html.

Learning Fair Division from Bandit Feedback

Abstract

Cite this Paper

Related Material