Online Convex Optimization with Sublinear Noisy Probes

Simone Di Gregorio; Anupam Gupta; Stefano Leonardi; Matteo Russo

Online Convex Optimization with Sublinear Noisy Probes

Simone Di Gregorio, Anupam Gupta, Stefano Leonardi, Matteo Russo

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:1938-1962, 2026.

Abstract

We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear \emph{best-expert} queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ \emph{pairwise probes}; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a \emph{sublinear and noisy} probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $\delta$-noisy pairwise probes, we obtain: $ {\textup{\textsc{Reg}}}_T \le O\left(\min\left\{\sqrt{dT\ln T},; \frac{dT\ln T}{k|1-2\delta|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $\delta$. Specifically regarding the noise parameter $\delta \in [0,1]$, the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $\delta$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-di-gregorio26a,
  title = 	 {Online Convex Optimization with Sublinear Noisy Probes},
  author =       {{Di Gregorio}, Simone and Gupta, Anupam and Leonardi, Stefano and Russo, Matteo},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {1938--1962},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/di-gregorio26a/di-gregorio26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/di-gregorio26a.html},
  abstract = 	 { We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear \emph{best-expert} queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ \emph{pairwise probes}; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a \emph{sublinear and noisy} probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $\delta$-noisy pairwise probes, we obtain: $ {\textup{\textsc{Reg}}}_T \le O\left(\min\left\{\sqrt{dT\ln T},; \frac{dT\ln T}{k|1-2\delta|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $\delta$. Specifically regarding the noise parameter $\delta \in [0,1]$,  the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $\delta$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.}
}

Endnote

%0 Conference Paper
%T Online Convex Optimization with Sublinear Noisy Probes
%A Simone Di Gregorio
%A Anupam Gupta
%A Stefano Leonardi
%A Matteo Russo
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-di-gregorio26a
%I PMLR
%P 1938--1962
%U https://proceedings.mlr.press/v336/di-gregorio26a.html
%V 336
%X  We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear \emph{best-expert} queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ \emph{pairwise probes}; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a \emph{sublinear and noisy} probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $\delta$-noisy pairwise probes, we obtain: $ {\textup{\textsc{Reg}}}_T \le O\left(\min\left\{\sqrt{dT\ln T},; \frac{dT\ln T}{k|1-2\delta|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $\delta$. Specifically regarding the noise parameter $\delta \in [0,1]$,  the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $\delta$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

APA

Di Gregorio, S., Gupta, A., Leonardi, S. & Russo, M.. (2026). Online Convex Optimization with Sublinear Noisy Probes. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:1938-1962 Available from https://proceedings.mlr.press/v336/di-gregorio26a.html.

Related Material

Download PDF