Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning

Avinandan Bose, Laurent Lessard, Maryam Fazel, Krishnamurthy Dj Dvijotham
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4438-4446, 2025.

Abstract

The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation problem and binary classification problems and outline directions for extending this in further work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-bose25b, title = {Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning}, author = {Bose, Avinandan and Lessard, Laurent and Fazel, Maryam and Dvijotham, Krishnamurthy Dj}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4438--4446}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/bose25b/bose25b.pdf}, url = {https://proceedings.mlr.press/v258/bose25b.html}, abstract = {The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation problem and binary classification problems and outline directions for extending this in further work.} }
Endnote
%0 Conference Paper %T Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning %A Avinandan Bose %A Laurent Lessard %A Maryam Fazel %A Krishnamurthy Dj Dvijotham %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-bose25b %I PMLR %P 4438--4446 %U https://proceedings.mlr.press/v258/bose25b.html %V 258 %X The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation problem and binary classification problems and outline directions for extending this in further work.
APA
Bose, A., Lessard, L., Fazel, M. & Dvijotham, K.D.. (2025). Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4438-4446 Available from https://proceedings.mlr.press/v258/bose25b.html.

Related Material