Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

Alexander Ryabchenko, Idan Attias, Daniel M. Roy
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:4973-5014, 2025.

Abstract

We study online learning with oblivious losses and delays under a novel “capacity constraint” that limits how many past rounds can be tracked simultaneously for delayed feedback. Under “clairvoyance” (i.e., delay durations are revealed upfront each round) and/or “preemptibility” (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the “optimal capacity” needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min\{K/\log(K),d\})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-ryabchenko25a, title = {Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs}, author = {Ryabchenko, Alexander and Attias, Idan and Roy, Daniel M.}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {4973--5014}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/ryabchenko25a/ryabchenko25a.pdf}, url = {https://proceedings.mlr.press/v291/ryabchenko25a.html}, abstract = {We study online learning with oblivious losses and delays under a novel “capacity constraint” that limits how many past rounds can be tracked simultaneously for delayed feedback. Under “clairvoyance” (i.e., delay durations are revealed upfront each round) and/or “preemptibility” (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the “optimal capacity” needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min\{K/\log(K),d\})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity. } }
Endnote
%0 Conference Paper %T Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs %A Alexander Ryabchenko %A Idan Attias %A Daniel M. Roy %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-ryabchenko25a %I PMLR %P 4973--5014 %U https://proceedings.mlr.press/v291/ryabchenko25a.html %V 291 %X We study online learning with oblivious losses and delays under a novel “capacity constraint” that limits how many past rounds can be tracked simultaneously for delayed feedback. Under “clairvoyance” (i.e., delay durations are revealed upfront each round) and/or “preemptibility” (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the “optimal capacity” needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min\{K/\log(K),d\})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.
APA
Ryabchenko, A., Attias, I. & Roy, D.M.. (2025). Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:4973-5014 Available from https://proceedings.mlr.press/v291/ryabchenko25a.html.

Related Material