UniRank: Unimodal Bandit Algorithms for Online Ranking

Camille-Sovanneary Gauthier; Romaric Gaudel; Elisa Fromont

UniRank: Unimodal Bandit Algorithms for Online Ranking

Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7279-7309, 2022.

Abstract

We tackle, in the multiple-play bandit setting, the online ranking problem of assigning L items to K predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the pseudo-unimodality property of the bandit setting with respect to a graph where nodes are ordered sets of indistinguishable items. The main contribution of UniRank is its O(L/$\Delta$ logT) regret for T consecutive assignments, where $\Delta$ relates to the reward-gap between two items. This regret bound is based on the usually implicit condition that two items may not have the same attractiveness. Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-gauthier22a,
  title = 	 {{U}ni{R}ank: Unimodal Bandit Algorithms for Online Ranking},
  author =       {Gauthier, Camille-Sovanneary and Gaudel, Romaric and Fromont, Elisa},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {7279--7309},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/gauthier22a/gauthier22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/gauthier22a.html},
  abstract = 	 {We tackle, in the multiple-play bandit setting, the online ranking problem of assigning L items to K predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the pseudo-unimodality property of the bandit setting with respect to a graph where nodes are ordered sets of indistinguishable items. The main contribution of UniRank is its O(L/$\Delta$ logT) regret for T consecutive assignments, where $\Delta$ relates to the reward-gap between two items. This regret bound is based on the usually implicit condition that two items may not have the same attractiveness. Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets.}
}

Endnote

%0 Conference Paper
%T UniRank: Unimodal Bandit Algorithms for Online Ranking
%A Camille-Sovanneary Gauthier
%A Romaric Gaudel
%A Elisa Fromont
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-gauthier22a
%I PMLR
%P 7279--7309
%U https://proceedings.mlr.press/v162/gauthier22a.html
%V 162
%X We tackle, in the multiple-play bandit setting, the online ranking problem of assigning L items to K predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the pseudo-unimodality property of the bandit setting with respect to a graph where nodes are ordered sets of indistinguishable items. The main contribution of UniRank is its O(L/$\Delta$ logT) regret for T consecutive assignments, where $\Delta$ relates to the reward-gap between two items. This regret bound is based on the usually implicit condition that two items may not have the same attractiveness. Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets.

APA

Gauthier, C., Gaudel, R. & Fromont, E.. (2022). UniRank: Unimodal Bandit Algorithms for Online Ranking. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:7279-7309 Available from https://proceedings.mlr.press/v162/gauthier22a.html.

Related Material

Download PDF