In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:7553-7567, 2024.

Abstract

Large language models (LLMs) frequently hallucinate, e.g., making factual errors, yet our understanding of why they make these errors remains limited. In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of inner representations. We discover a pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to that of the incorrect generations. Leveraging this signal, we propose an entropy-based metric to quantify the sharpness among the in-context hidden states and incorporate it into the decoding process, i.e, use the entropy value to adjust the next token prediction distribution to improve the factuality and overall quality of the generated text. Experiments on knowledge-seeking datasets (Natural Questions, HotpotQA, TriviaQA) and hallucination benchmark (TruthfulQA) demonstrate our consistent effectiveness, e.g., up to 8.6 absolute points on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chen24av, title = {In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation}, author = {Chen, Shiqi and Xiong, Miao and Liu, Junteng and Wu, Zhengxuan and Xiao, Teng and Gao, Siyang and He, Junxian}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {7553--7567}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24av/chen24av.pdf}, url = {https://proceedings.mlr.press/v235/chen24av.html}, abstract = {Large language models (LLMs) frequently hallucinate, e.g., making factual errors, yet our understanding of why they make these errors remains limited. In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of inner representations. We discover a pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to that of the incorrect generations. Leveraging this signal, we propose an entropy-based metric to quantify the sharpness among the in-context hidden states and incorporate it into the decoding process, i.e, use the entropy value to adjust the next token prediction distribution to improve the factuality and overall quality of the generated text. Experiments on knowledge-seeking datasets (Natural Questions, HotpotQA, TriviaQA) and hallucination benchmark (TruthfulQA) demonstrate our consistent effectiveness, e.g., up to 8.6 absolute points on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.} }
Endnote
%0 Conference Paper %T In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation %A Shiqi Chen %A Miao Xiong %A Junteng Liu %A Zhengxuan Wu %A Teng Xiao %A Siyang Gao %A Junxian He %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chen24av %I PMLR %P 7553--7567 %U https://proceedings.mlr.press/v235/chen24av.html %V 235 %X Large language models (LLMs) frequently hallucinate, e.g., making factual errors, yet our understanding of why they make these errors remains limited. In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of inner representations. We discover a pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to that of the incorrect generations. Leveraging this signal, we propose an entropy-based metric to quantify the sharpness among the in-context hidden states and incorporate it into the decoding process, i.e, use the entropy value to adjust the next token prediction distribution to improve the factuality and overall quality of the generated text. Experiments on knowledge-seeking datasets (Natural Questions, HotpotQA, TriviaQA) and hallucination benchmark (TruthfulQA) demonstrate our consistent effectiveness, e.g., up to 8.6 absolute points on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.
APA
Chen, S., Xiong, M., Liu, J., Wu, Z., Xiao, T., Gao, S. & He, J.. (2024). In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:7553-7567 Available from https://proceedings.mlr.press/v235/chen24av.html.

Related Material