Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

Jing Wang, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:65595-65617, 2025.

Abstract

We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develop a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a one-pass algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\varepsilon}$ is the $(1+\varepsilon)$-th central moment of reward at round $t$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wang25et, title = {Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update}, author = {Wang, Jing and Zhang, Yu-Jie and Zhao, Peng and Zhou, Zhi-Hua}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {65595--65617}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25et/wang25et.pdf}, url = {https://proceedings.mlr.press/v267/wang25et.html}, abstract = {We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develop a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a one-pass algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\varepsilon}$ is the $(1+\varepsilon)$-th central moment of reward at round $t$.} }
Endnote
%0 Conference Paper %T Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update %A Jing Wang %A Yu-Jie Zhang %A Peng Zhao %A Zhi-Hua Zhou %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wang25et %I PMLR %P 65595--65617 %U https://proceedings.mlr.press/v267/wang25et.html %V 267 %X We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develop a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a one-pass algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\varepsilon}$ is the $(1+\varepsilon)$-th central moment of reward at round $t$.
APA
Wang, J., Zhang, Y., Zhao, P. & Zhou, Z.. (2025). Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:65595-65617 Available from https://proceedings.mlr.press/v267/wang25et.html.

Related Material