Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning

Dong Yin, Yudong Chen, Ramchandran Kannan, Peter Bartlett
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7074-7084, 2019.

Abstract

We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior, and in this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used. We develop ByzantinePGD, a robust first-order algorithm that can provably escape saddle points and fake local minima, and converge to an approximate true local minimizer with low iteration complexity. As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting. We further discuss three robust gradient estimators that can be used in ByzantinePGD, including median, trimmed mean, and iterative filtering. We characterize their performance in concrete statistical settings, and argue for their near-optimality in low and high dimensional regimes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-yin19a, title = {Defending Against Saddle Point Attack in {B}yzantine-Robust Distributed Learning}, author = {Yin, Dong and Chen, Yudong and Kannan, Ramchandran and Bartlett, Peter}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7074--7084}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/yin19a/yin19a.pdf}, url = {https://proceedings.mlr.press/v97/yin19a.html}, abstract = {We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior, and in this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used. We develop ByzantinePGD, a robust first-order algorithm that can provably escape saddle points and fake local minima, and converge to an approximate true local minimizer with low iteration complexity. As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting. We further discuss three robust gradient estimators that can be used in ByzantinePGD, including median, trimmed mean, and iterative filtering. We characterize their performance in concrete statistical settings, and argue for their near-optimality in low and high dimensional regimes.} }
Endnote
%0 Conference Paper %T Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning %A Dong Yin %A Yudong Chen %A Ramchandran Kannan %A Peter Bartlett %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-yin19a %I PMLR %P 7074--7084 %U https://proceedings.mlr.press/v97/yin19a.html %V 97 %X We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior, and in this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used. We develop ByzantinePGD, a robust first-order algorithm that can provably escape saddle points and fake local minima, and converge to an approximate true local minimizer with low iteration complexity. As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting. We further discuss three robust gradient estimators that can be used in ByzantinePGD, including median, trimmed mean, and iterative filtering. We characterize their performance in concrete statistical settings, and argue for their near-optimality in low and high dimensional regimes.
APA
Yin, D., Chen, Y., Kannan, R. & Bartlett, P.. (2019). Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7074-7084 Available from https://proceedings.mlr.press/v97/yin19a.html.

Related Material