A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Zhihan Xiong; Romain Camilleri; Maryam Fazel; Lalit Jain; Kevin Jamieson

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1585-1593, 2024.

Abstract

We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set

$\mathcal{X}\subset\mathbb{R}^d$ , a fixed budget

$T$ , and an unpredictable sequence of parameters

$\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$ , an algorithm will aim to correctly identify the best arm

$x^* := \arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where

$\theta_t = \theta_1$ for all

$t$ and demonstrated that the error probability decreases as

$\exp(-T /\rho^*)$ for a problem-dependent constant

$\rho^*$ . But in many real-world

$A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over

$\mathcal{X}$ at each time then the error probability decreases as

$\exp(-T\Delta^2_{(1)}/d)$ , where

$\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$ . As there exist environments where

$\Delta_{(1)}^2/ d \ll 1/ \rho^*$ , we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-xiong24a,
  title = 	 {{A}/{B} Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity},
  author =       {Xiong, Zhihan and Camilleri, Romain and Fazel, Maryam and Jain, Lalit and Jamieson, Kevin},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1585--1593},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/xiong24a/xiong24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/xiong24a.html},
  abstract = 	 {We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^* := \arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability decreases as $\exp(-T /\rho^*)$ for a problem-dependent constant $\rho^*$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$. As there exist environments where $\Delta_{(1)}^2/ d \ll 1/ \rho^*$, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.}
}

Endnote

%0 Conference Paper
%T A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
%A Zhihan Xiong
%A Romain Camilleri
%A Maryam Fazel
%A Lalit Jain
%A Kevin Jamieson
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-xiong24a
%I PMLR
%P 1585--1593
%U https://proceedings.mlr.press/v238/xiong24a.html
%V 238
%X We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^* := \arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability decreases as $\exp(-T /\rho^*)$ for a problem-dependent constant $\rho^*$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$. As there exist environments where $\Delta_{(1)}^2/ d \ll 1/ \rho^*$, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

APA

Xiong, Z., Camilleri, R., Fazel, M., Jain, L. & Jamieson, K.. (2024). A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1585-1593 Available from https://proceedings.mlr.press/v238/xiong24a.html.

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Abstract

Cite this Paper

Related Material