Filter bubbles and affective polarization in user-personalized large language model outputs

Han Wu, Sareh Rowlands, Johan Wahlstrom
Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, PMLR 296:20-25, 2025.

Abstract

As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v296-wu25a, title = {Filter bubbles and affective polarization in user-personalized large language model outputs}, author = {Wu, Han and Rowlands, Sareh and Wahlstrom, Johan}, booktitle = {Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops}, pages = {20--25}, year = {2025}, editor = {Blaas, Arno and D’Costa, Priya and Feng, Fan and Kriegler, Andreas and Mason, Ian and Pan, Zhaoying and Uelwer, Tobias and Williams, Jennifer and Xie, Yubin and Yang, Rui}, volume = {296}, series = {Proceedings of Machine Learning Research}, month = {28 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v296/main/assets/wu25a/wu25a.pdf}, url = {https://proceedings.mlr.press/v296/wu25a.html}, abstract = {As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.} }
Endnote
%0 Conference Paper %T Filter bubbles and affective polarization in user-personalized large language model outputs %A Han Wu %A Sareh Rowlands %A Johan Wahlstrom %B Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops %C Proceedings of Machine Learning Research %D 2025 %E Arno Blaas %E Priya D’Costa %E Fan Feng %E Andreas Kriegler %E Ian Mason %E Zhaoying Pan %E Tobias Uelwer %E Jennifer Williams %E Yubin Xie %E Rui Yang %F pmlr-v296-wu25a %I PMLR %P 20--25 %U https://proceedings.mlr.press/v296/wu25a.html %V 296 %X As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.
APA
Wu, H., Rowlands, S. & Wahlstrom, J.. (2025). Filter bubbles and affective polarization in user-personalized large language model outputs. Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, in Proceedings of Machine Learning Research 296:20-25 Available from https://proceedings.mlr.press/v296/wu25a.html.

Related Material