Nonstochastic Contextual Combinatorial Bandits

Lukas Zierahn, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Gergely Neu
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8771-8813, 2023.

Abstract

We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-zierahn23a, title = {Nonstochastic Contextual Combinatorial Bandits}, author = {Zierahn, Lukas and van der Hoeven, Dirk and Cesa-Bianchi, Nicol\`o and Neu, Gergely}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {8771--8813}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/zierahn23a/zierahn23a.pdf}, url = {https://proceedings.mlr.press/v206/zierahn23a.html}, abstract = {We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.} }
Endnote
%0 Conference Paper %T Nonstochastic Contextual Combinatorial Bandits %A Lukas Zierahn %A Dirk van der Hoeven %A Nicolò Cesa-Bianchi %A Gergely Neu %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-zierahn23a %I PMLR %P 8771--8813 %U https://proceedings.mlr.press/v206/zierahn23a.html %V 206 %X We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.
APA
Zierahn, L., van der Hoeven, D., Cesa-Bianchi, N. & Neu, G.. (2023). Nonstochastic Contextual Combinatorial Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8771-8813 Available from https://proceedings.mlr.press/v206/zierahn23a.html.

Related Material