Open Problem: Tight Online Confidence Intervals for RKHS Elements

Sattar Vakili, Jonathan Scarlett, Tara Javidi
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:4647-4652, 2021.

Abstract

Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel-based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-open-problem-vakili21a, title = {Open Problem: Tight Online Confidence Intervals for RKHS Elements}, author = {Vakili, Sattar and Scarlett, Jonathan and Javidi, Tara}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {4647--4652}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/vakili21a/vakili21a.pdf}, url = {https://proceedings.mlr.press/v134/open-problem-vakili21a.html}, abstract = {Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel-based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results. } }
Endnote
%0 Conference Paper %T Open Problem: Tight Online Confidence Intervals for RKHS Elements %A Sattar Vakili %A Jonathan Scarlett %A Tara Javidi %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-open-problem-vakili21a %I PMLR %P 4647--4652 %U https://proceedings.mlr.press/v134/open-problem-vakili21a.html %V 134 %X Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel-based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results.
APA
Vakili, S., Scarlett, J. & Javidi, T.. (2021). Open Problem: Tight Online Confidence Intervals for RKHS Elements. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:4647-4652 Available from https://proceedings.mlr.press/v134/open-problem-vakili21a.html.

Related Material