Combinatorial categorized bandits with expert rankings

Sayak Ray Chowdhury, Gaurav Sinha, Nagarajan Natarajan, Amit Sharma
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:403-412, 2023.

Abstract

Many real-world systems such as e-commerce websites and content-serving platforms employ two-stage recommendation — in the first stage, multiple nominators (experts) provide ranked lists of items (one nominator per category, e.g., sports and political news articles), and in the second stage, an aggregator filters across the lists and outputs a single (short) list of K items to the users. The aggregation stage can be posed as a combinatorial multi-armed bandit problem, with the additional structure that the arms are grouped into categories (disjoint sets of items) and the ranking of arms within each category is known. We propose algorithms for selecting top K items in this setting under two learning objectives, namely minimizing regret over rounds and identifying the top K items within a fixed number of rounds. For each of the objectives, we provide sharp regret/error analysis using carefully defined notion of “gap” that exploits our problem structure. The resulting regret/error bounds strictly improve over prior work in combinatorial bandits literature. We also provide supporting evidence from simulations on synthetic and semi-synthetic problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-chowdhury23a, title = {Combinatorial categorized bandits with expert rankings}, author = {Chowdhury, Sayak Ray and Sinha, Gaurav and Natarajan, Nagarajan and Sharma, Amit}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {403--412}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/chowdhury23a/chowdhury23a.pdf}, url = {https://proceedings.mlr.press/v216/chowdhury23a.html}, abstract = {Many real-world systems such as e-commerce websites and content-serving platforms employ two-stage recommendation — in the first stage, multiple nominators (experts) provide ranked lists of items (one nominator per category, e.g., sports and political news articles), and in the second stage, an aggregator filters across the lists and outputs a single (short) list of K items to the users. The aggregation stage can be posed as a combinatorial multi-armed bandit problem, with the additional structure that the arms are grouped into categories (disjoint sets of items) and the ranking of arms within each category is known. We propose algorithms for selecting top K items in this setting under two learning objectives, namely minimizing regret over rounds and identifying the top K items within a fixed number of rounds. For each of the objectives, we provide sharp regret/error analysis using carefully defined notion of “gap” that exploits our problem structure. The resulting regret/error bounds strictly improve over prior work in combinatorial bandits literature. We also provide supporting evidence from simulations on synthetic and semi-synthetic problems.} }
Endnote
%0 Conference Paper %T Combinatorial categorized bandits with expert rankings %A Sayak Ray Chowdhury %A Gaurav Sinha %A Nagarajan Natarajan %A Amit Sharma %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-chowdhury23a %I PMLR %P 403--412 %U https://proceedings.mlr.press/v216/chowdhury23a.html %V 216 %X Many real-world systems such as e-commerce websites and content-serving platforms employ two-stage recommendation — in the first stage, multiple nominators (experts) provide ranked lists of items (one nominator per category, e.g., sports and political news articles), and in the second stage, an aggregator filters across the lists and outputs a single (short) list of K items to the users. The aggregation stage can be posed as a combinatorial multi-armed bandit problem, with the additional structure that the arms are grouped into categories (disjoint sets of items) and the ranking of arms within each category is known. We propose algorithms for selecting top K items in this setting under two learning objectives, namely minimizing regret over rounds and identifying the top K items within a fixed number of rounds. For each of the objectives, we provide sharp regret/error analysis using carefully defined notion of “gap” that exploits our problem structure. The resulting regret/error bounds strictly improve over prior work in combinatorial bandits literature. We also provide supporting evidence from simulations on synthetic and semi-synthetic problems.
APA
Chowdhury, S.R., Sinha, G., Natarajan, N. & Sharma, A.. (2023). Combinatorial categorized bandits with expert rankings. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:403-412 Available from https://proceedings.mlr.press/v216/chowdhury23a.html.

Related Material