Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs

Teodor Vanislavov Marinov; Mehryar Mohri; Julian Zimmert

Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs

Teodor Vanislavov Marinov, Mehryar Mohri, Julian Zimmert

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:5644-5649, 2022.

Abstract

Both asymptotic and non-asymptotic instance dependent regret bounds are known for the stochastic multi-armed bandit problem. Such regret bounds are known to be tight up to lower order terms in the setting of Gaussian rewards (Garivier et al., 2019). We revisit the related problem of stochastic online learning with feedback graphs, where asymptotically optimal instance dependent algorithms are known. Surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and in general, it is decoupled from the asymptotic rate. We pose two open problems. First we ask for a characterization of the finite time instance-dependent optimal regret. Next, we ask for a characterization of the set of graphs for which the finite time regret is bounded by the asymptotically optimal rate, for reasonable values of the time horizon.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-open-problem-marinov22a,
  title = 	 {Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs},
  author =       {Marinov, Teodor Vanislavov and Mohri, Mehryar and Zimmert, Julian},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {5644--5649},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/open-problem-marinov22a/open-problem-marinov22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/open-problem-marinov22a.html},
  abstract = 	 {Both asymptotic and non-asymptotic instance dependent regret bounds are known for the stochastic multi-armed bandit problem. Such regret bounds are known to be tight up to lower order terms in the setting of Gaussian rewards (Garivier et al., 2019). We revisit the related problem of stochastic online learning with feedback graphs, where asymptotically optimal instance dependent algorithms are known. Surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and in general, it is decoupled from the asymptotic rate. We pose two open problems. First we ask for a characterization of the finite time instance-dependent optimal regret.  Next, we ask for a characterization of the set of graphs for which the finite time regret is bounded by the asymptotically optimal rate, for reasonable values of the time horizon.}
}

Endnote

%0 Conference Paper
%T Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs
%A Teodor Vanislavov Marinov
%A Mehryar Mohri
%A Julian Zimmert
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-open-problem-marinov22a
%I PMLR
%P 5644--5649
%U https://proceedings.mlr.press/v178/open-problem-marinov22a.html
%V 178
%X Both asymptotic and non-asymptotic instance dependent regret bounds are known for the stochastic multi-armed bandit problem. Such regret bounds are known to be tight up to lower order terms in the setting of Gaussian rewards (Garivier et al., 2019). We revisit the related problem of stochastic online learning with feedback graphs, where asymptotically optimal instance dependent algorithms are known. Surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and in general, it is decoupled from the asymptotic rate. We pose two open problems. First we ask for a characterization of the finite time instance-dependent optimal regret.  Next, we ask for a characterization of the set of graphs for which the finite time regret is bounded by the asymptotically optimal rate, for reasonable values of the time horizon.

APA


Marinov, T.V., Mohri, M. & Zimmert, J.. (2022). Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:5644-5649 Available from https://proceedings.mlr.press/v178/open-problem-marinov22a.html.

Related Material

Download PDF