Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms

Ilker Demirel; Cem Tekin

Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms

Ilker Demirel, Cem Tekin

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3844-3852, 2021.

Abstract

Combinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive ($O( \sqrt{m T \log T \gamma_{T, \boldsymbol{\mu}}^{PTA}})$) $O(m \sqrt{\frac{T \log T}{p^*}})$ upper bounds for the regret of ComGP-UCB that hold with high probability, where $m$ denotes the number of base arms, $p^*$ denotes the minimum non-zero triggering probability, and $\gamma_{T, \boldsymbol{\mu}}^{PTA}$ denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS.

Cite this Paper

BibTeX

@InProceedings{pmlr-v130-demirel21a,
  title = 	 { Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms },
  author =       {Demirel, Ilker and Tekin, Cem},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3844--3852},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/demirel21a/demirel21a.pdf},
  url = 	 {https://proceedings.mlr.press/v130/demirel21a.html},
  abstract = 	 { Combinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive ($O( \sqrt{m T \log T \gamma_{T, \boldsymbol{\mu}}^{PTA}})$) $O(m \sqrt{\frac{T \log T}{p^*}})$ upper bounds for the regret of ComGP-UCB that hold with high probability, where $m$ denotes the number of base arms, $p^*$ denotes the minimum non-zero triggering probability, and $\gamma_{T, \boldsymbol{\mu}}^{PTA}$ denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS. }
}

Endnote

%0 Conference Paper
%T  Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms 
%A Ilker Demirel
%A Cem Tekin
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-demirel21a
%I PMLR
%P 3844--3852
%U https://proceedings.mlr.press/v130/demirel21a.html
%V 130
%X  Combinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive ($O( \sqrt{m T \log T \gamma_{T, \boldsymbol{\mu}}^{PTA}})$) $O(m \sqrt{\frac{T \log T}{p^*}})$ upper bounds for the regret of ComGP-UCB that hold with high probability, where $m$ denotes the number of base arms, $p^*$ denotes the minimum non-zero triggering probability, and $\gamma_{T, \boldsymbol{\mu}}^{PTA}$ denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS.

APA

Demirel, I. & Tekin, C.. (2021).  Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3844-3852 Available from https://proceedings.mlr.press/v130/demirel21a.html.

Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms

Abstract

Cite this Paper

Related Material