Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness

Arman Bolatov; Samuel Horváth; Martin Takáč; Eduard Gorbunov

Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness

Arman Bolatov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

Conference on Parsimony and Learning, PMLR 328:826-854, 2026.

Abstract

We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose $\texttt{Byz-NSGDM}$, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that $\texttt{Byz-NSGDM}$ achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification and synthetic $(L_0,L_1)$-smooth optimization problems demonstrates the effectiveness of our approach against various Byzantine attack strategies.

Cite this Paper

BibTeX

@InProceedings{pmlr-v328-bolatov26a,
  title = 	 {Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness},
  author =       {Bolatov, Arman and Horv\'{a}th, Samuel and Tak\'{a}\v{c}, Martin and Gorbunov, Eduard},
  booktitle = 	 {Conference on Parsimony and Learning},
  pages = 	 {826--854},
  year = 	 {2026},
  editor = 	 {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui},
  volume = 	 {328},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v328/main/assets/bolatov26a/bolatov26a.pdf},
  url = 	 {https://proceedings.mlr.press/v328/bolatov26a.html},
  abstract = 	 {We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose $\texttt{Byz-NSGDM}$, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that $\texttt{Byz-NSGDM}$ achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification and synthetic $(L_0,L_1)$-smooth optimization problems demonstrates the effectiveness of our approach against various Byzantine attack strategies.}
}

Endnote

%0 Conference Paper
%T Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness
%A Arman Bolatov
%A Samuel Horváth
%A Martin Takáč
%A Eduard Gorbunov
%B Conference on Parsimony and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Rebekka Burkholz
%E Shiwei Liu
%E Saiprasad Ravishankar
%E William Redman
%E Wei Huang
%E Weijie Su
%E Zhihui Zhu	
%F pmlr-v328-bolatov26a
%I PMLR
%P 826--854
%U https://proceedings.mlr.press/v328/bolatov26a.html
%V 328
%X We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose $\texttt{Byz-NSGDM}$, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that $\texttt{Byz-NSGDM}$ achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification and synthetic $(L_0,L_1)$-smooth optimization problems demonstrates the effectiveness of our approach against various Byzantine attack strategies.

APA

Bolatov, A., Horváth, S., Takáč, M. & Gorbunov, E.. (2026). Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:826-854 Available from https://proceedings.mlr.press/v328/bolatov26a.html.

Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness

Abstract

Cite this Paper

Related Material