Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits

Julian Zimmert, Tor Lattimore
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:3285-3312, 2022.

Abstract

We introduce a modification of follow the regularised leader and combine it with the log determinant potential and suitable loss estimators to prove that the minimax regret for adaptive adversarial linear bandits is at most $O(d \sqrt{T \log(T)})$ where $d$ is the dimension and $T$ is the number of rounds. By using exponential weights, we improve this bound to $O(\sqrt{dT\log(kT)})$ when the action set has size $k$. These results confirms an old conjecture. We also show that follow the regularized leader with the entropic barrier and suitable loss estimators has regret against an adaptive adversary of at most $O(d^2 \sqrt{T} \log(T))$ and can be implement in polynomial time, which improves on the best known bound for an efficient algorithm of $O(d^{7/2} \sqrt{T} \poly(\log(T)))$ by Lee et al 2020.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-zimmert22b, title = {Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits}, author = {Zimmert, Julian and Lattimore, Tor}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {3285--3312}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/zimmert22b/zimmert22b.pdf}, url = {https://proceedings.mlr.press/v178/zimmert22b.html}, abstract = {We introduce a modification of follow the regularised leader and combine it with the log determinant potential and suitable loss estimators to prove that the minimax regret for adaptive adversarial linear bandits is at most $O(d \sqrt{T \log(T)})$ where $d$ is the dimension and $T$ is the number of rounds. By using exponential weights, we improve this bound to $O(\sqrt{dT\log(kT)})$ when the action set has size $k$. These results confirms an old conjecture. We also show that follow the regularized leader with the entropic barrier and suitable loss estimators has regret against an adaptive adversary of at most $O(d^2 \sqrt{T} \log(T))$ and can be implement in polynomial time, which improves on the best known bound for an efficient algorithm of $O(d^{7/2} \sqrt{T} \poly(\log(T)))$ by Lee et al 2020.} }
Endnote
%0 Conference Paper %T Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits %A Julian Zimmert %A Tor Lattimore %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-zimmert22b %I PMLR %P 3285--3312 %U https://proceedings.mlr.press/v178/zimmert22b.html %V 178 %X We introduce a modification of follow the regularised leader and combine it with the log determinant potential and suitable loss estimators to prove that the minimax regret for adaptive adversarial linear bandits is at most $O(d \sqrt{T \log(T)})$ where $d$ is the dimension and $T$ is the number of rounds. By using exponential weights, we improve this bound to $O(\sqrt{dT\log(kT)})$ when the action set has size $k$. These results confirms an old conjecture. We also show that follow the regularized leader with the entropic barrier and suitable loss estimators has regret against an adaptive adversary of at most $O(d^2 \sqrt{T} \log(T))$ and can be implement in polynomial time, which improves on the best known bound for an efficient algorithm of $O(d^{7/2} \sqrt{T} \poly(\log(T)))$ by Lee et al 2020.
APA
Zimmert, J. & Lattimore, T.. (2022). Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:3285-3312 Available from https://proceedings.mlr.press/v178/zimmert22b.html.

Related Material