Stochastic Bandits with ReLU Neural Networks

Kan Xu; Hamsa Bastani; Surbhi Goel; Osbert Bastani

Stochastic Bandits with ReLU Neural Networks

Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:54866-54887, 2024.

Abstract

We study the stochastic bandit problem with ReLU neural network structure. We show that a

$\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-xu24c,
  title = 	 {Stochastic Bandits with {R}e{LU} Neural Networks},
  author =       {Xu, Kan and Bastani, Hamsa and Goel, Surbhi and Bastani, Osbert},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {54866--54887},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24c/xu24c.pdf},
  url = 	 {https://proceedings.mlr.press/v235/xu24c.html},
  abstract = 	 {We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.}
}

Endnote

%0 Conference Paper
%T Stochastic Bandits with ReLU Neural Networks
%A Kan Xu
%A Hamsa Bastani
%A Surbhi Goel
%A Osbert Bastani
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-xu24c
%I PMLR
%P 54866--54887
%U https://proceedings.mlr.press/v235/xu24c.html
%V 235
%X We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

APA


Xu, K., Bastani, H., Goel, S. & Bastani, O.. (2024). Stochastic Bandits with ReLU Neural Networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:54866-54887 Available from https://proceedings.mlr.press/v235/xu24c.html.

Stochastic Bandits with ReLU Neural Networks

Abstract

Cite this Paper

Related Material