Selective Network Linearization for Efficient Private Inference

Minsu Cho, Ameya Joshi, Brandon Reagen, Siddharth Garg, Chinmay Hegde
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:3947-3961, 2022.

Abstract

Private inference (PI) enables inferences directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-cho22a, title = {Selective Network Linearization for Efficient Private Inference}, author = {Cho, Minsu and Joshi, Ameya and Reagen, Brandon and Garg, Siddharth and Hegde, Chinmay}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {3947--3961}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/cho22a/cho22a.pdf}, url = {https://proceedings.mlr.press/v162/cho22a.html}, abstract = {Private inference (PI) enables inferences directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy.} }
Endnote
%0 Conference Paper %T Selective Network Linearization for Efficient Private Inference %A Minsu Cho %A Ameya Joshi %A Brandon Reagen %A Siddharth Garg %A Chinmay Hegde %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-cho22a %I PMLR %P 3947--3961 %U https://proceedings.mlr.press/v162/cho22a.html %V 162 %X Private inference (PI) enables inferences directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy.
APA
Cho, M., Joshi, A., Reagen, B., Garg, S. & Hegde, C.. (2022). Selective Network Linearization for Efficient Private Inference. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:3947-3961 Available from https://proceedings.mlr.press/v162/cho22a.html.

Related Material