Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou; Pierre Gaillard; Thibaud Rahier; Julyan Arbel

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:1361-1385, 2025.

Abstract

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a

$O(d\log(dT))$ problem-dependent upper bound for the

$1/2$ -approximate pseudo-regret, as well as a

$O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-zhou25a,
  title = 	 {Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit},
  author =       {Zhou, Julien and Gaillard, Pierre and Rahier, Thibaud and Arbel, Julyan},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {1361--1385},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/zhou25a/zhou25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/zhou25a.html},
  abstract = 	 {We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a  $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.}
}

Endnote

%0 Conference Paper
%T Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
%A Julien Zhou
%A Pierre Gaillard
%A Thibaud Rahier
%A Julyan Arbel
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-zhou25a
%I PMLR
%P 1361--1385
%U https://proceedings.mlr.press/v272/zhou25a.html
%V 272
%X We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a  $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

APA

Zhou, J., Gaillard, P., Rahier, T. & Arbel, J.. (2025). Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:1361-1385 Available from https://proceedings.mlr.press/v272/zhou25a.html.

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Abstract

Cite this Paper

Related Material