Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan, Bingchen Zhao, Timothy Hospedales
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62892-62913, 2024.

Abstract

Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code to reproduce all experiments is provided in supplementary materials.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zong24b, title = {Fool Your ({V}ision and) Language Model with Embarrassingly Simple Permutations}, author = {Zong, Yongshuo and Yu, Tingyang and Chavhan, Ruchika and Zhao, Bingchen and Hospedales, Timothy}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62892--62913}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zong24b/zong24b.pdf}, url = {https://proceedings.mlr.press/v235/zong24b.html}, abstract = {Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code to reproduce all experiments is provided in supplementary materials.} }
Endnote
%0 Conference Paper %T Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations %A Yongshuo Zong %A Tingyang Yu %A Ruchika Chavhan %A Bingchen Zhao %A Timothy Hospedales %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zong24b %I PMLR %P 62892--62913 %U https://proceedings.mlr.press/v235/zong24b.html %V 235 %X Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code to reproduce all experiments is provided in supplementary materials.
APA
Zong, Y., Yu, T., Chavhan, R., Zhao, B. & Hospedales, T.. (2024). Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62892-62913 Available from https://proceedings.mlr.press/v235/zong24b.html.

Related Material