Heavy-tailed linear bandit with Huber regression

Minhyun Kang, Gi-Soo Kim
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1027-1036, 2023.

Abstract

Linear bandit algorithms have been extensively studied and have shown successful in sequential decision tasks despite their simplicity. Many algorithms however work under the assumption that the reward is the sum of linear function of observed contexts and a sub-Gaussian error. In practical applications, errors can be heavy-tailed, especially in financial data. In such reward environments, algorithms designed for sub-Gaussian error may underexplore, resulting in suboptimal regret. In this paper, we relax the reward assumption and propose a novel linear bandit algorithm which works well under heavy-tailed errors as well. The proposed algorithm utilizes Huber regression. When contexts are stochastic with positive definite covariance matrix and the $(1+\delta)$-th moment of the error is bounded by a constant, we show that the high-probability upper bound of the regret is $O(\sqrt{d}T^{\frac{1}{1+\delta}}(\log dT)^{\frac{\delta}{1+\delta}})$, where $d$ is the dimension of context variables, $T$ is the time horizon, and $\delta\in (0,1]$. This bound improves on the state-of-the-art regret bound of the Median of Means and Truncation algorithm by a factor of $\sqrt{\log T}$ and $\sqrt{d}$ for the case where the time horizon $T$ is unknown. We also remark that when $\delta=1$, the order is the same as the regret bound of linear bandit algorithms designed for sub-Gaussian errors. We support our theoretical findings with synthetic experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-kang23a, title = {Heavy-tailed linear bandit with {H}uber regression}, author = {Kang, Minhyun and Kim, Gi-Soo}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {1027--1036}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/kang23a/kang23a.pdf}, url = {https://proceedings.mlr.press/v216/kang23a.html}, abstract = {Linear bandit algorithms have been extensively studied and have shown successful in sequential decision tasks despite their simplicity. Many algorithms however work under the assumption that the reward is the sum of linear function of observed contexts and a sub-Gaussian error. In practical applications, errors can be heavy-tailed, especially in financial data. In such reward environments, algorithms designed for sub-Gaussian error may underexplore, resulting in suboptimal regret. In this paper, we relax the reward assumption and propose a novel linear bandit algorithm which works well under heavy-tailed errors as well. The proposed algorithm utilizes Huber regression. When contexts are stochastic with positive definite covariance matrix and the $(1+\delta)$-th moment of the error is bounded by a constant, we show that the high-probability upper bound of the regret is $O(\sqrt{d}T^{\frac{1}{1+\delta}}(\log dT)^{\frac{\delta}{1+\delta}})$, where $d$ is the dimension of context variables, $T$ is the time horizon, and $\delta\in (0,1]$. This bound improves on the state-of-the-art regret bound of the Median of Means and Truncation algorithm by a factor of $\sqrt{\log T}$ and $\sqrt{d}$ for the case where the time horizon $T$ is unknown. We also remark that when $\delta=1$, the order is the same as the regret bound of linear bandit algorithms designed for sub-Gaussian errors. We support our theoretical findings with synthetic experiments.} }
Endnote
%0 Conference Paper %T Heavy-tailed linear bandit with Huber regression %A Minhyun Kang %A Gi-Soo Kim %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-kang23a %I PMLR %P 1027--1036 %U https://proceedings.mlr.press/v216/kang23a.html %V 216 %X Linear bandit algorithms have been extensively studied and have shown successful in sequential decision tasks despite their simplicity. Many algorithms however work under the assumption that the reward is the sum of linear function of observed contexts and a sub-Gaussian error. In practical applications, errors can be heavy-tailed, especially in financial data. In such reward environments, algorithms designed for sub-Gaussian error may underexplore, resulting in suboptimal regret. In this paper, we relax the reward assumption and propose a novel linear bandit algorithm which works well under heavy-tailed errors as well. The proposed algorithm utilizes Huber regression. When contexts are stochastic with positive definite covariance matrix and the $(1+\delta)$-th moment of the error is bounded by a constant, we show that the high-probability upper bound of the regret is $O(\sqrt{d}T^{\frac{1}{1+\delta}}(\log dT)^{\frac{\delta}{1+\delta}})$, where $d$ is the dimension of context variables, $T$ is the time horizon, and $\delta\in (0,1]$. This bound improves on the state-of-the-art regret bound of the Median of Means and Truncation algorithm by a factor of $\sqrt{\log T}$ and $\sqrt{d}$ for the case where the time horizon $T$ is unknown. We also remark that when $\delta=1$, the order is the same as the regret bound of linear bandit algorithms designed for sub-Gaussian errors. We support our theoretical findings with synthetic experiments.
APA
Kang, M. & Kim, G.. (2023). Heavy-tailed linear bandit with Huber regression. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:1027-1036 Available from https://proceedings.mlr.press/v216/kang23a.html.

Related Material