Federated Heavy Hitter Recovery under Linear Sketching

Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10997-11012, 2023.

Abstract

Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $\Theta(\sqrt{R})$. We also empirically demonstrate our findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-gascon23a, title = {Federated Heavy Hitter Recovery under Linear Sketching}, author = {Gascon, Adria and Kairouz, Peter and Sun, Ziteng and Suresh, Ananda Theertha}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10997--11012}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/gascon23a/gascon23a.pdf}, url = {https://proceedings.mlr.press/v202/gascon23a.html}, abstract = {Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $\Theta(\sqrt{R})$. We also empirically demonstrate our findings.} }
Endnote
%0 Conference Paper %T Federated Heavy Hitter Recovery under Linear Sketching %A Adria Gascon %A Peter Kairouz %A Ziteng Sun %A Ananda Theertha Suresh %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-gascon23a %I PMLR %P 10997--11012 %U https://proceedings.mlr.press/v202/gascon23a.html %V 202 %X Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $\Theta(\sqrt{R})$. We also empirically demonstrate our findings.
APA
Gascon, A., Kairouz, P., Sun, Z. & Suresh, A.T.. (2023). Federated Heavy Hitter Recovery under Linear Sketching. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10997-11012 Available from https://proceedings.mlr.press/v202/gascon23a.html.

Related Material