Nonstochastic Contextual Combinatorial Bandits

Lukas Zierahn; Dirk van der Hoeven; Nicolò Cesa-Bianchi; Gergely Neu

Nonstochastic Contextual Combinatorial Bandits

Lukas Zierahn, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Gergely Neu

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8771-8813, 2023.

Abstract

We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order

$\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-zierahn23a,
  title = 	 {Nonstochastic Contextual Combinatorial Bandits},
  author =       {Zierahn, Lukas and van der Hoeven, Dirk and Cesa-Bianchi, Nicol\`o and Neu, Gergely},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {8771--8813},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/zierahn23a/zierahn23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/zierahn23a.html},
  abstract = 	 {We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.}
}

Endnote

%0 Conference Paper
%T Nonstochastic Contextual Combinatorial Bandits
%A Lukas Zierahn
%A Dirk van der Hoeven
%A Nicolò Cesa-Bianchi
%A Gergely Neu
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-zierahn23a
%I PMLR
%P 8771--8813
%U https://proceedings.mlr.press/v206/zierahn23a.html
%V 206
%X We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.

APA


Zierahn, L., van der Hoeven, D., Cesa-Bianchi, N. & Neu, G.. (2023). Nonstochastic Contextual Combinatorial Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8771-8813 Available from https://proceedings.mlr.press/v206/zierahn23a.html.

Nonstochastic Contextual Combinatorial Bandits

Abstract

Cite this Paper

Related Material