Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Sree Harsha Tanneru; Chirag Agarwal; Himabindu Lakkaraju

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1072-1080, 2024.

Abstract

Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLM’s behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics — Verbalized Uncertainty and Probing Uncertainty — to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence. Further, we show that the probing uncertainty estimates are correlated with the faithfulness of an explanation, with lower uncertainty corresponding to explanations with higher faithfulness. Our study provides insights into the challenges and opportunities of quantifying uncertainty in LLM explanations, contributing to the broader discussion of the trustworthiness of foundation models.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-harsha-tanneru24a,
  title = 	 {Quantifying Uncertainty in Natural Language Explanations of Large Language Models},
  author =       {Harsha Tanneru, Sree and Agarwal, Chirag and Lakkaraju, Himabindu},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1072--1080},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/harsha-tanneru24a/harsha-tanneru24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/harsha-tanneru24a.html},
  abstract = 	 {Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLM’s behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics — Verbalized Uncertainty and Probing Uncertainty — to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence. Further, we show that the probing uncertainty estimates are correlated with the faithfulness of an explanation, with lower uncertainty corresponding to explanations with higher faithfulness. Our study provides insights into the challenges and opportunities of quantifying uncertainty in LLM explanations, contributing to the broader discussion of the trustworthiness of foundation models.}
}

Endnote

%0 Conference Paper
%T Quantifying Uncertainty in Natural Language Explanations of Large Language Models
%A Sree Harsha Tanneru
%A Chirag Agarwal
%A Himabindu Lakkaraju
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-harsha-tanneru24a
%I PMLR
%P 1072--1080
%U https://proceedings.mlr.press/v238/harsha-tanneru24a.html
%V 238
%X Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLM’s behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics — Verbalized Uncertainty and Probing Uncertainty — to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence. Further, we show that the probing uncertainty estimates are correlated with the faithfulness of an explanation, with lower uncertainty corresponding to explanations with higher faithfulness. Our study provides insights into the challenges and opportunities of quantifying uncertainty in LLM explanations, contributing to the broader discussion of the trustworthiness of foundation models.

APA

Harsha Tanneru, S., Agarwal, C. & Lakkaraju, H.. (2024). Quantifying Uncertainty in Natural Language Explanations of Large Language Models. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1072-1080 Available from https://proceedings.mlr.press/v238/harsha-tanneru24a.html.

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Abstract

Cite this Paper

Related Material