How Language Model Hallucinations Can Snowball

Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59670-59684, 2024.

Abstract

A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we show that LMs sometimes produce hallucinations that they can separately recognize as incorrect. To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. Crucially, we find that GPT-3.5, GPT-4, and LLaMA2-70B-chat can identify 67%, 87%, and 94% of these incorrect claims, respectively. We show that this phenomenon doesn’t disappear under higher temperatures sampling, beam search, and zero-shot chain-of-thought prompting. These findings reveal that LM hallucinations can snowball: early mistakes by an LM can lead to more mistakes that otherwise would not be made.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24ay, title = {How Language Model Hallucinations Can Snowball}, author = {Zhang, Muru and Press, Ofir and Merrill, William and Liu, Alisa and Smith, Noah A.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {59670--59684}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24ay/zhang24ay.pdf}, url = {https://proceedings.mlr.press/v235/zhang24ay.html}, abstract = {A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we show that LMs sometimes produce hallucinations that they can separately recognize as incorrect. To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. Crucially, we find that GPT-3.5, GPT-4, and LLaMA2-70B-chat can identify 67%, 87%, and 94% of these incorrect claims, respectively. We show that this phenomenon doesn’t disappear under higher temperatures sampling, beam search, and zero-shot chain-of-thought prompting. These findings reveal that LM hallucinations can snowball: early mistakes by an LM can lead to more mistakes that otherwise would not be made.} }
Endnote
%0 Conference Paper %T How Language Model Hallucinations Can Snowball %A Muru Zhang %A Ofir Press %A William Merrill %A Alisa Liu %A Noah A. Smith %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24ay %I PMLR %P 59670--59684 %U https://proceedings.mlr.press/v235/zhang24ay.html %V 235 %X A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we show that LMs sometimes produce hallucinations that they can separately recognize as incorrect. To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. Crucially, we find that GPT-3.5, GPT-4, and LLaMA2-70B-chat can identify 67%, 87%, and 94% of these incorrect claims, respectively. We show that this phenomenon doesn’t disappear under higher temperatures sampling, beam search, and zero-shot chain-of-thought prompting. These findings reveal that LM hallucinations can snowball: early mistakes by an LM can lead to more mistakes that otherwise would not be made.
APA
Zhang, M., Press, O., Merrill, W., Liu, A. & Smith, N.A.. (2024). How Language Model Hallucinations Can Snowball. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59670-59684 Available from https://proceedings.mlr.press/v235/zhang24ay.html.

Related Material