Linear Bandits on Ellipsoids: Minimax Optimal Algorithms

Raymond Zhang; Hadiji Hédi; Combes Richard

Linear Bandits on Ellipsoids: Minimax Optimal Algorithms

Raymond Zhang, Hadiji Hédi, Combes Richard

Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:6016-6040, 2025.

Abstract

We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $\sigma^2$ the noise variance, $A$ a matrix defining the set of actions and $\theta$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant. The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm. The main idea is to combine a novel sequential procedure to estimate $\|\theta\|$, followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly computationally efficient, and a run requires only time $O(dT + d^2 \log(T/d) + d^3)$ and memory $O(d^2)$, in contrast with known optimistic algorithms, which are not implementable in polynomial time. We go beyond minimax optimality and show that our algorithm is locally asymptotically minimax optimal, a much stronger notion of optimality. We further provide numerical experiments to illustrate our theoretical findings. The code to reproduce the experiments is available at \url{https://github.com/RaymZhang/LinearBanditsEllipsoidsMinimaxCOLT}.

Cite this Paper

BibTeX

@InProceedings{pmlr-v291-zhang25b,
  title = 	 {Linear Bandits on Ellipsoids: Minimax Optimal Algorithms},
  author =       {Zhang, Raymond and H{\'e}di, Hadiji and Richard, Combes},
  booktitle = 	 {Proceedings of Thirty Eighth Conference on Learning Theory},
  pages = 	 {6016--6040},
  year = 	 {2025},
  editor = 	 {Haghtalab, Nika and Moitra, Ankur},
  volume = 	 {291},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--04 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v291/main/assets/zhang25b/zhang25b.pdf},
  url = 	 {https://proceedings.mlr.press/v291/zhang25b.html},
  abstract = 	 {We consider linear stochastic bandits where the set of actions is an ellipsoid.  We provide the first known minimax optimal algorithm for this problem.  We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $\sigma^2$ the noise variance, $A$ a matrix defining the set of actions and $\theta$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant.  The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm.  The main idea is to combine a novel sequential procedure to estimate $\|\theta\|$, followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly computationally efficient, and a run requires only time $O(dT + d^2 \log(T/d) + d^3)$ and memory $O(d^2)$, in contrast with known optimistic algorithms, which are not implementable in polynomial time. We go beyond minimax optimality and show that our algorithm is locally asymptotically minimax optimal, a much stronger notion of optimality.  We further provide numerical experiments to illustrate our theoretical findings. The code to reproduce the experiments is available at \url{https://github.com/RaymZhang/LinearBanditsEllipsoidsMinimaxCOLT}. }
}

Endnote

%0 Conference Paper
%T Linear Bandits on Ellipsoids: Minimax Optimal Algorithms
%A Raymond Zhang
%A Hadiji Hédi
%A Combes Richard
%B Proceedings of Thirty Eighth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Nika Haghtalab
%E Ankur Moitra	
%F pmlr-v291-zhang25b
%I PMLR
%P 6016--6040
%U https://proceedings.mlr.press/v291/zhang25b.html
%V 291
%X We consider linear stochastic bandits where the set of actions is an ellipsoid.  We provide the first known minimax optimal algorithm for this problem.  We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $\sigma^2$ the noise variance, $A$ a matrix defining the set of actions and $\theta$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant.  The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm.  The main idea is to combine a novel sequential procedure to estimate $\|\theta\|$, followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly computationally efficient, and a run requires only time $O(dT + d^2 \log(T/d) + d^3)$ and memory $O(d^2)$, in contrast with known optimistic algorithms, which are not implementable in polynomial time. We go beyond minimax optimality and show that our algorithm is locally asymptotically minimax optimal, a much stronger notion of optimality.  We further provide numerical experiments to illustrate our theoretical findings. The code to reproduce the experiments is available at \url{https://github.com/RaymZhang/LinearBanditsEllipsoidsMinimaxCOLT}.

APA

Zhang, R., Hédi, H. & Richard, C.. (2025). Linear Bandits on Ellipsoids: Minimax Optimal Algorithms. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:6016-6040 Available from https://proceedings.mlr.press/v291/zhang25b.html.

Related Material

Download PDF