[edit]
Online Realizable Regression and Applications for ReLU Networks
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:2105-2106, 2026.
Abstract
Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al., 2023 shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. For an hypothesis class $\mathcal{H}$, we define an entropy potential $\Phi(\mathcal{H})=\int_{0}^{\operatorname{diam}(\mathcal{H})} \log N(\mathcal{H},\varepsilon) d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$ under the sup pseudo metric $\sup_x \ell(f(x),g(x))$, and show that for every $c$-approximate pseudo-metric loss it holds that $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c \cdot \Phi(\mathcal{H}))$. In particular, polynomial metric entropy implies $\Phi(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. For the class $\mathcal{H}_L$ of all $L$-Lipschitz functions on $[-1,1]^d$ under the loss $\ell_q(y,y’)=|y-y’|^q$, we establish a sharp phase transition: if $q>d$ then $\mathbb{D}_{\mathrm{onl}}(\mathcal{H}_L)=\Theta_{d,q}(L^d)$, and the bound is achievable efficiently, whereas if $q\le d$ then $\mathbb{D}_{\mathrm{onl}}(\mathcal{H}_L)=\infty$. Complementing these metric-specific results, for any continuous loss with $\ell(y,y)=0$, the loss along realizable sequences in $\mathcal{H}_L$ satisfies $\ell(\hat y_t,y_t)\to 0$ as $t\to\infty$. As a second application, we study bounded-norm $k$-ReLU networks over $[-1,1]^d$ with squared loss and highlight a regression–classification separation: realizable online classification is impossible already for $k=2,d=1$ under $0/1$ loss, yet realizable regression admits finite total loss, including a $\widetilde O(k^2)$ cumulative-loss upper bound, a matching lower bound up to polylogarithmic factors, and an efficient $O(1)$ guarantee for a single ReLU ($k=1$) independent of the input dimension $d$. Assuming Gap-ETH, we also rule out efficient proper online learners that achieve realizable accumulated loss $\widetilde o(d)$ for any constant $k\ge 2$.