PASS: Private Attributes Protection with Stochastic Data Substitution

Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu, Shaohan Hu, Tarek F. Abdelzaher
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:7816-7839, 2025.

Abstract

The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25h, title = {{PASS}: Private Attributes Protection with Stochastic Data Substitution}, author = {Chen, Yizhuo and Chen, Chun-Fu and Hsu, Hsiang and Hu, Shaohan and Abdelzaher, Tarek F.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {7816--7839}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25h/chen25h.pdf}, url = {https://proceedings.mlr.press/v267/chen25h.html}, abstract = {The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.} }
Endnote
%0 Conference Paper %T PASS: Private Attributes Protection with Stochastic Data Substitution %A Yizhuo Chen %A Chun-Fu Chen %A Hsiang Hsu %A Shaohan Hu %A Tarek F. Abdelzaher %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25h %I PMLR %P 7816--7839 %U https://proceedings.mlr.press/v267/chen25h.html %V 267 %X The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.
APA
Chen, Y., Chen, C., Hsu, H., Hu, S. & Abdelzaher, T.F.. (2025). PASS: Private Attributes Protection with Stochastic Data Substitution. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:7816-7839 Available from https://proceedings.mlr.press/v267/chen25h.html.

Related Material