Constant regret for sequence prediction with limited advice

El Mehdi Saad; Gilles Blanchard

Constant regret for sequence prediction with limited advice

El Mehdi Saad, Gilles Blanchard

Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:1343-1386, 2023.

Abstract

We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(sqrt{KT}). We show that allowing the learner to play one additional expert per round and observe one additional feedback, improves substantially the guarantees on regret. We provide a strategy combining only p=2 experts per round for prediction and observing m \ge 2 experts’ losses. Its randomized regret (wrt. internal randomization of the learners’ strategy) is of order O((K/m) log(K delta^{-1})) with probability 1- delta, i.e., is independent of the horizon T (“constant” or “fast rate” regret) if (p \ge 2 and m \ge 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p=m=2, we provide an upper bound of order O(K^2 \log(K delta^{-1})), with probability 1-delta. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter \delta. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the “slow rate” Omega(sqrt{KT}), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

Cite this Paper

BibTeX


@InProceedings{pmlr-v201-saad23a,
  title = 	 {Constant regret for sequence prediction with limited advice},
  author =       {Saad, El Mehdi and Blanchard, Gilles},
  booktitle = 	 {Proceedings of The 34th International Conference on Algorithmic Learning Theory},
  pages = 	 {1343--1386},
  year = 	 {2023},
  editor = 	 {Agrawal, Shipra and Orabona, Francesco},
  volume = 	 {201},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20 Feb--23 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v201/saad23a/saad23a.pdf},
  url = 	 {https://proceedings.mlr.press/v201/saad23a.html},
  abstract = 	 {We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(sqrt{KT}). We show that allowing the learner to play one additional expert per round and observe one additional feedback, improves substantially the guarantees on regret. We provide a strategy combining only p=2 experts per round for prediction and observing m \ge 2 experts’ losses. Its randomized regret (wrt. internal randomization of the learners’ strategy) is of order O((K/m) log(K delta^{-1})) with probability 1- delta, i.e., is independent of the horizon T (“constant” or “fast rate” regret) if (p \ge 2 and m \ge 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p=m=2, we provide an upper bound of order O(K^2 \log(K delta^{-1})), with probability 1-delta. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter \delta. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the “slow rate” Omega(sqrt{KT}), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret. }
}

Endnote

%0 Conference Paper
%T Constant regret for sequence prediction with limited advice
%A El Mehdi Saad
%A Gilles Blanchard
%B Proceedings of The 34th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Shipra Agrawal
%E Francesco Orabona	
%F pmlr-v201-saad23a
%I PMLR
%P 1343--1386
%U https://proceedings.mlr.press/v201/saad23a.html
%V 201
%X We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(sqrt{KT}). We show that allowing the learner to play one additional expert per round and observe one additional feedback, improves substantially the guarantees on regret. We provide a strategy combining only p=2 experts per round for prediction and observing m \ge 2 experts’ losses. Its randomized regret (wrt. internal randomization of the learners’ strategy) is of order O((K/m) log(K delta^{-1})) with probability 1- delta, i.e., is independent of the horizon T (“constant” or “fast rate” regret) if (p \ge 2 and m \ge 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p=m=2, we provide an upper bound of order O(K^2 \log(K delta^{-1})), with probability 1-delta. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter \delta. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the “slow rate” Omega(sqrt{KT}), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

APA


Saad, E.M. & Blanchard, G.. (2023). Constant regret for sequence prediction with limited advice. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:1343-1386 Available from https://proceedings.mlr.press/v201/saad23a.html.

Related Material

Download PDF