Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression

Kei Sen Fong; Mehul Motani

Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression

Kei Sen Fong, Mehul Motani

Proceedings of the fifth Conference on Health, Inference, and Learning, PMLR 248:198-216, 2024.

Abstract

Explainability and privacy are the top concerns in machine learning (ML) for medical applications. In this paper, we propose a novel method, Domain-Aware Symbolic Regression with Homomorphic Encryption (DASR-HE), that addresses both concerns simultaneously by: (i) producing domain-aware, intuitive and explainable models that do not require the end-user to possess ML expertise and (ii) training only on securely encrypted data without access to actual data values or model parameters. DASR-HE is based on Symbolic Regression (SR), which is a first-class ML approach that produces simple and concise equations for regression, requiring no ML expertise to interpret. In our work, we improve the performance of SR algorithms by using existing domain-specific medical equations to augment the search space of equations, decreasing the search complexity and producing equations that are similar in structure to those used in practice. To preserve the privacy of the medical data, we enable our algorithm to learn on data that is homomorphically encrypted (HE), meaning that arithmetic operations can be done in the encrypted space. This makes HE suitable for machine learning algorithms to learn models without access to the actual data values or model parameters. We evaluate DASR-HE on three medical tasks, namely predicting glomerular filtration rate, endotracheal tube (ETT) internal diameter and ETT depth and find that DASR-HE outperforms existing medical equations, other SR ML algorithms and other explainable ML algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v248-fong24a,
  title = 	 {Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression},
  author =       {Fong, Kei Sen and Motani, Mehul},
  booktitle = 	 {Proceedings of the fifth Conference on Health, Inference, and Learning},
  pages = 	 {198--216},
  year = 	 {2024},
  editor = 	 {Pollard, Tom and Choi, Edward and Singhal, Pankhuri and Hughes, Michael and Sizikova, Elena and Mortazavi, Bobak and Chen, Irene and Wang, Fei and Sarker, Tasmie and McDermott, Matthew and Ghassemi, Marzyeh},
  volume = 	 {248},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v248/main/assets/fong24a/fong24a.pdf},
  url = 	 {https://proceedings.mlr.press/v248/fong24a.html},
  abstract = 	 {Explainability and privacy are the top concerns in machine learning (ML) for medical applications. In this paper, we propose a novel method, Domain-Aware Symbolic Regression with Homomorphic Encryption (DASR-HE), that addresses both concerns simultaneously by: (i) producing domain-aware, intuitive and explainable models that do not require the end-user to possess ML expertise and (ii) training only on securely encrypted data without access to actual data values or model parameters. DASR-HE is based on Symbolic Regression (SR), which is a first-class ML approach that produces simple and concise equations for regression, requiring no ML expertise to interpret. In our work, we improve the performance of SR algorithms by using existing domain-specific medical equations to augment the search space of equations, decreasing the search complexity and producing equations that are similar in structure to those used in practice. To preserve the privacy of the medical data, we enable our algorithm to learn on data that is homomorphically encrypted (HE), meaning that arithmetic operations can be done in the encrypted space. This makes HE suitable for machine learning algorithms to learn models without access to the actual data values or model parameters. We evaluate DASR-HE on three medical tasks, namely predicting glomerular filtration rate, endotracheal tube (ETT) internal diameter and ETT depth and find that DASR-HE outperforms existing medical equations, other SR ML algorithms and other explainable ML algorithms. }
}

Endnote

%0 Conference Paper
%T Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression
%A Kei Sen Fong
%A Mehul Motani
%B Proceedings of the fifth Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Tom Pollard
%E Edward Choi
%E Pankhuri Singhal
%E Michael Hughes
%E Elena Sizikova
%E Bobak Mortazavi
%E Irene Chen
%E Fei Wang
%E Tasmie Sarker
%E Matthew McDermott
%E Marzyeh Ghassemi	
%F pmlr-v248-fong24a
%I PMLR
%P 198--216
%U https://proceedings.mlr.press/v248/fong24a.html
%V 248
%X Explainability and privacy are the top concerns in machine learning (ML) for medical applications. In this paper, we propose a novel method, Domain-Aware Symbolic Regression with Homomorphic Encryption (DASR-HE), that addresses both concerns simultaneously by: (i) producing domain-aware, intuitive and explainable models that do not require the end-user to possess ML expertise and (ii) training only on securely encrypted data without access to actual data values or model parameters. DASR-HE is based on Symbolic Regression (SR), which is a first-class ML approach that produces simple and concise equations for regression, requiring no ML expertise to interpret. In our work, we improve the performance of SR algorithms by using existing domain-specific medical equations to augment the search space of equations, decreasing the search complexity and producing equations that are similar in structure to those used in practice. To preserve the privacy of the medical data, we enable our algorithm to learn on data that is homomorphically encrypted (HE), meaning that arithmetic operations can be done in the encrypted space. This makes HE suitable for machine learning algorithms to learn models without access to the actual data values or model parameters. We evaluate DASR-HE on three medical tasks, namely predicting glomerular filtration rate, endotracheal tube (ETT) internal diameter and ETT depth and find that DASR-HE outperforms existing medical equations, other SR ML algorithms and other explainable ML algorithms.

APA


Fong, K.S. & Motani, M.. (2024). Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression. Proceedings of the fifth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 248:198-216 Available from https://proceedings.mlr.press/v248/fong24a.html.

Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression

Abstract

Cite this Paper

Related Material