Safety aware changepoint detection for piecewise i.i.d. bandits

Subhojyoti Mukherjee
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1402-1412, 2022.

Abstract

In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in Wu et al. (2016) to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms match the performance of the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-mukherjee22a, title = {Safety aware changepoint detection for piecewise i.i.d. bandits}, author = {Mukherjee, Subhojyoti}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {1402--1412}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/mukherjee22a/mukherjee22a.pdf}, url = {https://proceedings.mlr.press/v180/mukherjee22a.html}, abstract = {In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in Wu et al. (2016) to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms match the performance of the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.} }
Endnote
%0 Conference Paper %T Safety aware changepoint detection for piecewise i.i.d. bandits %A Subhojyoti Mukherjee %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-mukherjee22a %I PMLR %P 1402--1412 %U https://proceedings.mlr.press/v180/mukherjee22a.html %V 180 %X In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in Wu et al. (2016) to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms match the performance of the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.
APA
Mukherjee, S.. (2022). Safety aware changepoint detection for piecewise i.i.d. bandits. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1402-1412 Available from https://proceedings.mlr.press/v180/mukherjee22a.html.

Related Material