Bayesian Change-Point Detection for Bandit Feedback
 in Non-stationary Environments

Reda Alami

Bayesian Change-Point Detection for Bandit Feedback in Non-stationary Environments

Reda Alami

Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:17-31, 2023.

Abstract

The stochastic multi-armed bandit problem has been widely studied under the stationary assumption. However in real world problems and industrial applications, this assumption is often unrealistic because the distributions of rewards may change over time. In this paper, we consider the piece-wise iid non-stationary stochastic multi-armed bandit problem with unknown change-points and we focus on the change of mean setup. To solve the latter, we propose a change-point based framework where we study a class of change-detection based optimal bandit policies that actively detects change-point using the restarted Bayesian online change-point detector and then restarts the bandit indices. Analytically, in the context of regret minimization, our proposal achieves a

$\mathcal{O}(\sqrt{A T K_T })$ regret upper-bound where

$K_T$ is the overall number of change-points up to the horizon

$T$ and

$A$ is the number of arms. The derived bound matches the existing lower bound for abruptly changing environments. Finally, we demonstrate the cumulative regret reduction of the our proposal over synthetic Bernoulli rewards as well as Yahoo! datasets of webpage click-through rates.

Cite this Paper

BibTeX


@InProceedings{pmlr-v189-alami23a,
  title = 	 {Bayesian Change-Point Detection for Bandit Feedback
 in Non-stationary Environments},
  author =       {Alami, Reda},
  booktitle = 	 {Proceedings of The 14th Asian Conference on Machine
 Learning},
  pages = 	 {17--31},
  year = 	 {2023},
  editor = 	 {Khan, Emtiyaz and Gonen, Mehmet},
  volume = 	 {189},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v189/alami23a/alami23a.pdf},
  url = 	 {https://proceedings.mlr.press/v189/alami23a.html},
  abstract = 	 {The stochastic multi-armed bandit problem has been
 widely studied under the stationary
 assumption. However in real world problems and
 industrial applications, this assumption is often
 unrealistic because the distributions of rewards may
 change over time. In this paper, we consider the
 piece-wise iid non-stationary stochastic multi-armed
 bandit problem with unknown change-points and we
 focus on the change of mean setup. To solve the
 latter, we propose a change-point based framework
 where we study a class of change-detection based
 optimal bandit policies that actively detects
 change-point using the restarted Bayesian online
 change-point detector and then restarts the bandit
 indices. Analytically, in the context of regret
 minimization, our proposal achieves a
 $\mathcal{O}(\sqrt{A T K_T })$ regret upper-bound
 where $K_T$ is the overall number of change-points
 up to the horizon $T$ and $A$ is the number of
 arms. The derived bound matches the existing lower
 bound for abruptly changing environments. Finally,
 we demonstrate the cumulative regret reduction of
 the our proposal over synthetic Bernoulli rewards as
 well as Yahoo! datasets of webpage click-through
 rates.}
}

Endnote

%0 Conference Paper
%T Bayesian Change-Point Detection for Bandit Feedback
 in Non-stationary Environments
%A Reda Alami
%B Proceedings of The 14th Asian Conference on Machine
 Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Emtiyaz Khan
%E Mehmet Gonen	
%F pmlr-v189-alami23a
%I PMLR
%P 17--31
%U https://proceedings.mlr.press/v189/alami23a.html
%V 189
%X The stochastic multi-armed bandit problem has been
 widely studied under the stationary
 assumption. However in real world problems and
 industrial applications, this assumption is often
 unrealistic because the distributions of rewards may
 change over time. In this paper, we consider the
 piece-wise iid non-stationary stochastic multi-armed
 bandit problem with unknown change-points and we
 focus on the change of mean setup. To solve the
 latter, we propose a change-point based framework
 where we study a class of change-detection based
 optimal bandit policies that actively detects
 change-point using the restarted Bayesian online
 change-point detector and then restarts the bandit
 indices. Analytically, in the context of regret
 minimization, our proposal achieves a
 $\mathcal{O}(\sqrt{A T K_T })$ regret upper-bound
 where $K_T$ is the overall number of change-points
 up to the horizon $T$ and $A$ is the number of
 arms. The derived bound matches the existing lower
 bound for abruptly changing environments. Finally,
 we demonstrate the cumulative regret reduction of
 the our proposal over synthetic Bernoulli rewards as
 well as Yahoo! datasets of webpage click-through
 rates.

APA


Alami, R.. (2023). Bayesian Change-Point Detection for Bandit Feedback
 in Non-stationary Environments. Proceedings of The 14th Asian Conference on Machine
 Learning, in Proceedings of Machine Learning Research 189:17-31 Available from https://proceedings.mlr.press/v189/alami23a.html.

Bayesian Change-Point Detection for Bandit Feedback in Non-stationary Environments

Abstract

Cite this Paper

Related Material