RDM-DC: Poisoning Resilient Dataset Condensation with Robust Distribution Matching

Tianhang Zheng, Baochun Li
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2541-2550, 2023.

Abstract

Dataset condensation aims to condense the original training dataset into a small synthetic dataset for data-efficient learning. The recently proposed dataset condensation techniques allow the model trainers with limited resources to learn acceptable deep learning models on a small amount of synthetic data. However, in an adversarial environment, given the original dataset as a poisoned dataset, dataset condensation may encode the poisoning information into the condensed synthetic dataset. To explore the vulnerability of dataset condensation to data poisoning, we revisit the state-of-the-art targeted data poisoning method and customize a targeted data poisoning algorithm for dataset condensation. By executing the two poisoning methods, we demonstrate that, when the synthetic dataset is condensed from a poisoned dataset, the models trained on the synthetic dataset may predict the targeted sample as the attack-targeted label. To defend against data poisoning, we introduce the concept of poisoned deviation to quantify the poisoning effect. We further propose a poisoning-resilient dataset condensation algorithm with a calibration method to reduce poisoned deviation. Extensive evaluations demonstrate that our proposed algorithm can protect the synthetic dataset from data poisoning with minor performance drop.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-zheng23a, title = {{RDM-DC}: Poisoning Resilient Dataset Condensation with Robust Distribution Matching}, author = {Zheng, Tianhang and Li, Baochun}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2541--2550}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/zheng23a/zheng23a.pdf}, url = {https://proceedings.mlr.press/v216/zheng23a.html}, abstract = {Dataset condensation aims to condense the original training dataset into a small synthetic dataset for data-efficient learning. The recently proposed dataset condensation techniques allow the model trainers with limited resources to learn acceptable deep learning models on a small amount of synthetic data. However, in an adversarial environment, given the original dataset as a poisoned dataset, dataset condensation may encode the poisoning information into the condensed synthetic dataset. To explore the vulnerability of dataset condensation to data poisoning, we revisit the state-of-the-art targeted data poisoning method and customize a targeted data poisoning algorithm for dataset condensation. By executing the two poisoning methods, we demonstrate that, when the synthetic dataset is condensed from a poisoned dataset, the models trained on the synthetic dataset may predict the targeted sample as the attack-targeted label. To defend against data poisoning, we introduce the concept of poisoned deviation to quantify the poisoning effect. We further propose a poisoning-resilient dataset condensation algorithm with a calibration method to reduce poisoned deviation. Extensive evaluations demonstrate that our proposed algorithm can protect the synthetic dataset from data poisoning with minor performance drop.} }
Endnote
%0 Conference Paper %T RDM-DC: Poisoning Resilient Dataset Condensation with Robust Distribution Matching %A Tianhang Zheng %A Baochun Li %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-zheng23a %I PMLR %P 2541--2550 %U https://proceedings.mlr.press/v216/zheng23a.html %V 216 %X Dataset condensation aims to condense the original training dataset into a small synthetic dataset for data-efficient learning. The recently proposed dataset condensation techniques allow the model trainers with limited resources to learn acceptable deep learning models on a small amount of synthetic data. However, in an adversarial environment, given the original dataset as a poisoned dataset, dataset condensation may encode the poisoning information into the condensed synthetic dataset. To explore the vulnerability of dataset condensation to data poisoning, we revisit the state-of-the-art targeted data poisoning method and customize a targeted data poisoning algorithm for dataset condensation. By executing the two poisoning methods, we demonstrate that, when the synthetic dataset is condensed from a poisoned dataset, the models trained on the synthetic dataset may predict the targeted sample as the attack-targeted label. To defend against data poisoning, we introduce the concept of poisoned deviation to quantify the poisoning effect. We further propose a poisoning-resilient dataset condensation algorithm with a calibration method to reduce poisoned deviation. Extensive evaluations demonstrate that our proposed algorithm can protect the synthetic dataset from data poisoning with minor performance drop.
APA
Zheng, T. & Li, B.. (2023). RDM-DC: Poisoning Resilient Dataset Condensation with Robust Distribution Matching. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2541-2550 Available from https://proceedings.mlr.press/v216/zheng23a.html.

Related Material