Bandit Online Linear Optimization with Hints and Queries

Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit

Bandit Online Linear Optimization with Hints and Queries

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:2313-2336, 2023.

Abstract

We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of $\tilde{O}(\sqrt{T})$ by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show $O(\log T)$ regret is achievable. We then show how to make this result more robust—when some of the query responses can be adversarial—by using a little feedback on the quality of the responses.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-bhaskara23a,
  title = 	 {Bandit Online Linear Optimization with Hints and Queries},
  author =       {Bhaskara, Aditya and Cutkosky, Ashok and Kumar, Ravi and Purohit, Manish},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {2313--2336},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/bhaskara23a/bhaskara23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/bhaskara23a.html},
  abstract = 	 {We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of $\tilde{O}(\sqrt{T})$ by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show $O(\log T)$ regret is achievable. We then show how to make this result more robust—when some of the query responses can be adversarial—by using a little feedback on the quality of the responses.}
}

Endnote

%0 Conference Paper
%T Bandit Online Linear Optimization with Hints and Queries
%A Aditya Bhaskara
%A Ashok Cutkosky
%A Ravi Kumar
%A Manish Purohit
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-bhaskara23a
%I PMLR
%P 2313--2336
%U https://proceedings.mlr.press/v202/bhaskara23a.html
%V 202
%X We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of $\tilde{O}(\sqrt{T})$ by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show $O(\log T)$ regret is achievable. We then show how to make this result more robust—when some of the query responses can be adversarial—by using a little feedback on the quality of the responses.

APA

Bhaskara, A., Cutkosky, A., Kumar, R. & Purohit, M.. (2023). Bandit Online Linear Optimization with Hints and Queries. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:2313-2336 Available from https://proceedings.mlr.press/v202/bhaskara23a.html.

Bandit Online Linear Optimization with Hints and Queries

Abstract

Cite this Paper

Related Material