Efficient Contextual Bandits with Uninformed Feedback Graphs

Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60336-60358, 2024.

Abstract

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by [Zhang et al., 2023] studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing the feedback graph before making each decision, while in many applications, the feedback graph is uninformed, meaning that it is either only revealed after the learner makes her decision or even never fully revealed at all. This work develops the first contextual algorithms for such uninformed settings, via an efficient reduction to online regression over both the losses and the graphs. Importantly, we show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees. We also demonstrate the empirical effectiveness of our algorithm on a bidding application using both synthetic and real-world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24ce, title = {Efficient Contextual Bandits with Uninformed Feedback Graphs}, author = {Zhang, Mengxiao and Zhang, Yuheng and Luo, Haipeng and Mineiro, Paul}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {60336--60358}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24ce/zhang24ce.pdf}, url = {https://proceedings.mlr.press/v235/zhang24ce.html}, abstract = {Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by [Zhang et al., 2023] studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing the feedback graph before making each decision, while in many applications, the feedback graph is uninformed, meaning that it is either only revealed after the learner makes her decision or even never fully revealed at all. This work develops the first contextual algorithms for such uninformed settings, via an efficient reduction to online regression over both the losses and the graphs. Importantly, we show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees. We also demonstrate the empirical effectiveness of our algorithm on a bidding application using both synthetic and real-world data.} }
Endnote
%0 Conference Paper %T Efficient Contextual Bandits with Uninformed Feedback Graphs %A Mengxiao Zhang %A Yuheng Zhang %A Haipeng Luo %A Paul Mineiro %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24ce %I PMLR %P 60336--60358 %U https://proceedings.mlr.press/v235/zhang24ce.html %V 235 %X Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by [Zhang et al., 2023] studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing the feedback graph before making each decision, while in many applications, the feedback graph is uninformed, meaning that it is either only revealed after the learner makes her decision or even never fully revealed at all. This work develops the first contextual algorithms for such uninformed settings, via an efficient reduction to online regression over both the losses and the graphs. Importantly, we show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees. We also demonstrate the empirical effectiveness of our algorithm on a bidding application using both synthetic and real-world data.
APA
Zhang, M., Zhang, Y., Luo, H. & Mineiro, P.. (2024). Efficient Contextual Bandits with Uninformed Feedback Graphs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:60336-60358 Available from https://proceedings.mlr.press/v235/zhang24ce.html.

Related Material