SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference

Ran Ran; Xinwei Luo; Wei Wang; Tao Liu; Gang Quan; Xiaolin Xu; Caiwen Ding; Wujie Wen

SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference

Ran Ran, Xinwei Luo, Wei Wang, Tao Liu, Gang Quan, Xiaolin Xu, Caiwen Ding, Wujie Wen

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:28718-28728, 2023.

Abstract

Homomorphic Encryption (HE) is a promising technology to protect clients’ data privacy for Machine Learning as a Service (MLaaS) on public clouds. However, HE operations can be orders of magnitude slower than their counterparts for plaintexts and thus result in prohibitively high inference latency, seriously hindering the practicality of HE. In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency. In particular, we first develop an encryption-aware HE-group convolution technique that can partition channels among different groups based on the data size and ciphertext size, and then encode them into the same ciphertext by novel group-interleaved encoding, so as to dramatically reduce the number of bottlenecked operations in HE convolution. We further tailor a HE-friendly sub-block weight pruning to reduce the costly HE-based convolution operation. Our experiments show that SpENCNN can achieve overall speedups of 8.37

$\times$ , 12.11

$\times$ , 19.26

$\times$ , and 1.87

$\times$ for LeNet, VGG-5, HEFNet, and ResNet-20 respectively, with negligible accuracy loss. Our code is publicly available at https://github.com/ranran0523/SPECNN.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-ran23b,
  title = 	 {{S}p{ENCNN}: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference},
  author =       {Ran, Ran and Luo, Xinwei and Wang, Wei and Liu, Tao and Quan, Gang and Xu, Xiaolin and Ding, Caiwen and Wen, Wujie},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {28718--28728},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/ran23b/ran23b.pdf},
  url = 	 {https://proceedings.mlr.press/v202/ran23b.html},
  abstract = 	 {Homomorphic Encryption (HE) is a promising technology to protect clients’ data privacy for Machine Learning as a Service (MLaaS) on public clouds. However, HE operations can be orders of magnitude slower than their counterparts for plaintexts and thus result in prohibitively high inference latency, seriously hindering the practicality of HE. In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency. In particular, we first develop an encryption-aware HE-group convolution technique that can partition channels among different groups based on the data size and ciphertext size, and then encode them into the same ciphertext by novel group-interleaved encoding, so as to dramatically reduce the number of bottlenecked operations in HE convolution. We further tailor a HE-friendly sub-block weight pruning to reduce the costly HE-based convolution operation. Our experiments show that SpENCNN can achieve overall speedups of 8.37$\times$, 12.11$\times$, 19.26$\times$, and 1.87$\times$ for LeNet, VGG-5, HEFNet, and ResNet-20 respectively, with negligible accuracy loss. Our code is publicly available at https://github.com/ranran0523/SPECNN.}
}

Endnote

%0 Conference Paper
%T SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference
%A Ran Ran
%A Xinwei Luo
%A Wei Wang
%A Tao Liu
%A Gang Quan
%A Xiaolin Xu
%A Caiwen Ding
%A Wujie Wen
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-ran23b
%I PMLR
%P 28718--28728
%U https://proceedings.mlr.press/v202/ran23b.html
%V 202
%X Homomorphic Encryption (HE) is a promising technology to protect clients’ data privacy for Machine Learning as a Service (MLaaS) on public clouds. However, HE operations can be orders of magnitude slower than their counterparts for plaintexts and thus result in prohibitively high inference latency, seriously hindering the practicality of HE. In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency. In particular, we first develop an encryption-aware HE-group convolution technique that can partition channels among different groups based on the data size and ciphertext size, and then encode them into the same ciphertext by novel group-interleaved encoding, so as to dramatically reduce the number of bottlenecked operations in HE convolution. We further tailor a HE-friendly sub-block weight pruning to reduce the costly HE-based convolution operation. Our experiments show that SpENCNN can achieve overall speedups of 8.37$\times$, 12.11$\times$, 19.26$\times$, and 1.87$\times$ for LeNet, VGG-5, HEFNet, and ResNet-20 respectively, with negligible accuracy loss. Our code is publicly available at https://github.com/ranran0523/SPECNN.

APA


Ran, R., Luo, X., Wang, W., Liu, T., Quan, G., Xu, X., Ding, C. & Wen, W.. (2023). SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:28718-28728 Available from https://proceedings.mlr.press/v202/ran23b.html.

SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference

Abstract

Cite this Paper

Related Material