Zero-Inflated Bandits

Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:66194-66258, 2025.

Abstract

Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wei25l, title = {Zero-Inflated Bandits}, author = {Wei, Haoyu and Wan, Runzhe and Shi, Lei and Song, Rui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {66194--66258}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wei25l/wei25l.pdf}, url = {https://proceedings.mlr.press/v267/wei25l.html}, abstract = {Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.} }
Endnote
%0 Conference Paper %T Zero-Inflated Bandits %A Haoyu Wei %A Runzhe Wan %A Lei Shi %A Rui Song %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wei25l %I PMLR %P 66194--66258 %U https://proceedings.mlr.press/v267/wei25l.html %V 267 %X Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.
APA
Wei, H., Wan, R., Shi, L. & Song, R.. (2025). Zero-Inflated Bandits. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:66194-66258 Available from https://proceedings.mlr.press/v267/wei25l.html.

Related Material