Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients

Gwen Legate, Irina Rish, Eugene Belilovsky
Proceedings of The 4th Conference on Lifelong Learning Agents, PMLR 330:341-357, 2026.

Abstract

Federated learning enables collaborative model training across numerous edge devices while preserving the privacy of their local data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation that we seek to correct. We devise a federated memory-efficient zeroth-order optimizer, $\textbf{ZOWarmUp}$ that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v330-legate26a, title = {Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients}, author = {Legate, Gwen and Rish, Irina and Belilovsky, Eugene}, booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents}, pages = {341--357}, year = {2026}, editor = {Chandar, Sarath and Pascanu, Razvan and Eaton, Eric and Liu, Bing and Mahmood, Rupam and Rannen-Triki, Amal}, volume = {330}, series = {Proceedings of Machine Learning Research}, month = {11--14 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v330/main/assets/legate26a/legate26a.pdf}, url = {https://proceedings.mlr.press/v330/legate26a.html}, abstract = {Federated learning enables collaborative model training across numerous edge devices while preserving the privacy of their local data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation that we seek to correct. We devise a federated memory-efficient zeroth-order optimizer, $\textbf{ZOWarmUp}$ that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.} }
Endnote
%0 Conference Paper %T Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients %A Gwen Legate %A Irina Rish %A Eugene Belilovsky %B Proceedings of The 4th Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2026 %E Sarath Chandar %E Razvan Pascanu %E Eric Eaton %E Bing Liu %E Rupam Mahmood %E Amal Rannen-Triki %F pmlr-v330-legate26a %I PMLR %P 341--357 %U https://proceedings.mlr.press/v330/legate26a.html %V 330 %X Federated learning enables collaborative model training across numerous edge devices while preserving the privacy of their local data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation that we seek to correct. We devise a federated memory-efficient zeroth-order optimizer, $\textbf{ZOWarmUp}$ that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.
APA
Legate, G., Rish, I. & Belilovsky, E.. (2026). Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients. Proceedings of The 4th Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 330:341-357 Available from https://proceedings.mlr.press/v330/legate26a.html.

Related Material