Combinatorial categorized bandits with expert rankings
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:403-412, 2023.
Many real-world systems such as e-commerce websites and content-serving platforms employ two-stage recommendation — in the first stage, multiple nominators (experts) provide ranked lists of items (one nominator per category, e.g., sports and political news articles), and in the second stage, an aggregator filters across the lists and outputs a single (short) list of K items to the users. The aggregation stage can be posed as a combinatorial multi-armed bandit problem, with the additional structure that the arms are grouped into categories (disjoint sets of items) and the ranking of arms within each category is known. We propose algorithms for selecting top K items in this setting under two learning objectives, namely minimizing regret over rounds and identifying the top K items within a fixed number of rounds. For each of the objectives, we provide sharp regret/error analysis using carefully defined notion of “gap” that exploits our problem structure. The resulting regret/error bounds strictly improve over prior work in combinatorial bandits literature. We also provide supporting evidence from simulations on synthetic and semi-synthetic problems.