Filter bubbles and affective polarization in user-personalized large language model outputs

Han Wu; Sareh Rowlands; Johan Wahlstrom

Filter bubbles and affective polarization in user-personalized large language model outputs

Han Wu, Sareh Rowlands, Johan Wahlstrom

Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, PMLR 296:20-25, 2025.

Abstract

As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.

Cite this Paper

BibTeX

@InProceedings{pmlr-v296-wu25a,
  title = 	 {Filter bubbles and affective polarization in user-personalized large language model outputs},
  author =       {Wu, Han and Rowlands, Sareh and Wahlstrom, Johan},
  booktitle = 	 {Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops},
  pages = 	 {20--25},
  year = 	 {2025},
  editor = 	 {Blaas, Arno and D’Costa, Priya and Feng, Fan and Kriegler, Andreas and Mason, Ian and Pan, Zhaoying and Uelwer, Tobias and Williams, Jennifer and Xie, Yubin and Yang, Rui},
  volume = 	 {296},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v296/main/assets/wu25a/wu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v296/wu25a.html},
  abstract = 	 {As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.}
}

Endnote

%0 Conference Paper
%T Filter bubbles and affective polarization in user-personalized large language model outputs
%A Han Wu
%A Sareh Rowlands
%A Johan Wahlstrom
%B Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops
%C Proceedings of Machine Learning Research
%D 2025
%E Arno Blaas
%E Priya D’Costa
%E Fan Feng
%E Andreas Kriegler
%E Ian Mason
%E Zhaoying Pan
%E Tobias Uelwer
%E Jennifer Williams
%E Yubin Xie
%E Rui Yang	
%F pmlr-v296-wu25a
%I PMLR
%P 20--25
%U https://proceedings.mlr.press/v296/wu25a.html
%V 296
%X As cloud computing becomes pervasive, deep learning models are deployed on cloud servers and then provided as APIs to end users. However, black-box adversarial attacks can fool image classification models without access to model structure and weights. Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries. Then the question arises: whether black-box attacks have become a real threat against cloud APIs? To shed some light on this, our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers due to several common mistakes that overestimate the efficiency of black-box attacks. To avoid similar mistakes, we conduct black-box attacks directly on cloud APIs rather than local models.

APA

Wu, H., Rowlands, S. & Wahlstrom, J.. (2025). Filter bubbles and affective polarization in user-personalized large language model outputs. Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, in Proceedings of Machine Learning Research 296:20-25 Available from https://proceedings.mlr.press/v296/wu25a.html.

Related Material

Download PDF