Stochastic Bandits with ReLU Neural Networks

Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:54866-54887, 2024.

Abstract

We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xu24c, title = {Stochastic Bandits with {R}e{LU} Neural Networks}, author = {Xu, Kan and Bastani, Hamsa and Goel, Surbhi and Bastani, Osbert}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {54866--54887}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24c/xu24c.pdf}, url = {https://proceedings.mlr.press/v235/xu24c.html}, abstract = {We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.} }
Endnote
%0 Conference Paper %T Stochastic Bandits with ReLU Neural Networks %A Kan Xu %A Hamsa Bastani %A Surbhi Goel %A Osbert Bastani %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xu24c %I PMLR %P 54866--54887 %U https://proceedings.mlr.press/v235/xu24c.html %V 235 %X We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.
APA
Xu, K., Bastani, H., Goel, S. & Bastani, O.. (2024). Stochastic Bandits with ReLU Neural Networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:54866-54887 Available from https://proceedings.mlr.press/v235/xu24c.html.

Related Material