Piecewise Stationary Bandits under Risk Criteria

Sujay Bhatt; Guanhua Fang; Ping Li

Piecewise Stationary Bandits under Risk Criteria

Sujay Bhatt, Guanhua Fang, Ping Li

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:4313-4335, 2023.

Abstract

Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii)We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-bhatt23b,
  title = 	 {Piecewise Stationary Bandits under Risk Criteria},
  author =       {Bhatt, Sujay and Fang, Guanhua and Li, Ping},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4313--4335},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/bhatt23b/bhatt23b.pdf},
  url = 	 {https://proceedings.mlr.press/v206/bhatt23b.html},
  abstract = 	 {Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii)We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.}
}

Endnote

%0 Conference Paper
%T Piecewise Stationary Bandits under Risk Criteria
%A Sujay Bhatt
%A Guanhua Fang
%A Ping Li
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-bhatt23b
%I PMLR
%P 4313--4335
%U https://proceedings.mlr.press/v206/bhatt23b.html
%V 206
%X Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii)We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.

APA

Bhatt, S., Fang, G. & Li, P.. (2023). Piecewise Stationary Bandits under Risk Criteria. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:4313-4335 Available from https://proceedings.mlr.press/v206/bhatt23b.html.

Related Material

Download PDF