Experimental Design for Semiparametric Bandits

Seok-Jin Kim, Gi-Soo Kim, Min-hwan Oh
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:3215-3252, 2025.

Abstract

We study finite-armed semiparametric bandits, where each arm’s reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{\mathcal{O}}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-kim25a, title = {Experimental Design for Semiparametric Bandits}, author = {Kim, Seok-Jin and Kim, Gi-Soo and Oh, Min-hwan}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {3215--3252}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/kim25a/kim25a.pdf}, url = {https://proceedings.mlr.press/v291/kim25a.html}, abstract = {We study finite-armed semiparametric bandits, where each arm’s reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{\mathcal{O}}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.} }
Endnote
%0 Conference Paper %T Experimental Design for Semiparametric Bandits %A Seok-Jin Kim %A Gi-Soo Kim %A Min-hwan Oh %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-kim25a %I PMLR %P 3215--3252 %U https://proceedings.mlr.press/v291/kim25a.html %V 291 %X We study finite-armed semiparametric bandits, where each arm’s reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{\mathcal{O}}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.
APA
Kim, S., Kim, G. & Oh, M.. (2025). Experimental Design for Semiparametric Bandits. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:3215-3252 Available from https://proceedings.mlr.press/v291/kim25a.html.

Related Material