On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang; Zhihan Xiong; Kevin Jamieson; Maryam Fazel

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5021-5052, 2026.

Abstract

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace \theta_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T \theta_t$ with high probability. In this setting, it is well-known that i.i.d. sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-\Theta\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an \textit{arm-set-dependent} lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the \textit{Adjacent-optimal design}, a specialization of the well-known $\mathcal{XY}$-optimal design, and develop the \textsf{Adjacent-BAI} algorithm. We prove that the error probability of \textsf{Adjacent-BAI} matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-maynard-zhang26a,
  title = 	 {On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits},
  author =       {Maynard-Zhang, Leo and Xiong, Zhihan and Jamieson, Kevin and Fazel, Maryam},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {5021--5052},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/maynard-zhang26a/maynard-zhang26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/maynard-zhang26a.html},
  abstract = 	 {We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits.  Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace \theta_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T \theta_t$ with high probability. In this setting, it is well-known that i.i.d. sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-\Theta\left(T /  H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an \textit{arm-set-dependent} lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the \textit{Adjacent-optimal design}, a specialization of the well-known $\mathcal{XY}$-optimal design, and develop the \textsf{Adjacent-BAI} algorithm. We prove that the error probability of \textsf{Adjacent-BAI} matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.}
}

Endnote

%0 Conference Paper
%T On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
%A Leo Maynard-Zhang
%A Zhihan Xiong
%A Kevin Jamieson
%A Maryam Fazel
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-maynard-zhang26a
%I PMLR
%P 5021--5052
%U https://proceedings.mlr.press/v336/maynard-zhang26a.html
%V 336
%X We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits.  Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace \theta_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T \theta_t$ with high probability. In this setting, it is well-known that i.i.d. sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-\Theta\left(T /  H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an \textit{arm-set-dependent} lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the \textit{Adjacent-optimal design}, a specialization of the well-known $\mathcal{XY}$-optimal design, and develop the \textsf{Adjacent-BAI} algorithm. We prove that the error probability of \textsf{Adjacent-BAI} matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

APA

Maynard-Zhang, L., Xiong, Z., Jamieson, K. & Fazel, M.. (2026). On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5021-5052 Available from https://proceedings.mlr.press/v336/maynard-zhang26a.html.

Related Material

Download PDF