Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density

Zhaoye Li, Siyuan Shen, Wenjing Yang, Ruochun Jin, Huan Chen, Ligong Cao, Jing Ren
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:2537-2551, 2025.

Abstract

Large Language Models (LLMs) excel in language understanding but are susceptible to "confabulation," where they generate arbitrary, factually incorrect responses to uncertain questions. Detecting confabulation in question answering often relies on Uncertainty Quantification (UQ), which measures semantic entropy or consistency among sampled answers. While several methods have been proposed for UQ in LLMs, they suffer from key limitations, such as overlooking fine-grained semantic relationships among answers and neglecting answer probabilities. To address these issues, we propose Semantic Graph Density (SGD). SGD quantifies semantic consistency by evaluating the density of a semantic graph that captures fine-grained semantic relationships among answers. Additionally, it integrates answer probabilities to adjust the contribution of each edge to the overall uncertainty score. We theoretically prove that SGD generalizes the previous state-of-the-art method, Deg, and empirically demonstrate its superior performance across four LLMs and four free-form question-answering datasets. In particular, in experiments with Llama3.1-8B, SGD outperformed the best baseline by 1.52% in AUROC on the CoQA dataset and by 1.22% in AUARC on the TriviaQA dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-li25b, title = {Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density}, author = {Li, Zhaoye and Shen, Siyuan and Yang, Wenjing and Jin, Ruochun and Chen, Huan and Cao, Ligong and Ren, Jing}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {2537--2551}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/li25b/li25b.pdf}, url = {https://proceedings.mlr.press/v286/li25b.html}, abstract = {Large Language Models (LLMs) excel in language understanding but are susceptible to "confabulation," where they generate arbitrary, factually incorrect responses to uncertain questions. Detecting confabulation in question answering often relies on Uncertainty Quantification (UQ), which measures semantic entropy or consistency among sampled answers. While several methods have been proposed for UQ in LLMs, they suffer from key limitations, such as overlooking fine-grained semantic relationships among answers and neglecting answer probabilities. To address these issues, we propose Semantic Graph Density (SGD). SGD quantifies semantic consistency by evaluating the density of a semantic graph that captures fine-grained semantic relationships among answers. Additionally, it integrates answer probabilities to adjust the contribution of each edge to the overall uncertainty score. We theoretically prove that SGD generalizes the previous state-of-the-art method, Deg, and empirically demonstrate its superior performance across four LLMs and four free-form question-answering datasets. In particular, in experiments with Llama3.1-8B, SGD outperformed the best baseline by 1.52% in AUROC on the CoQA dataset and by 1.22% in AUARC on the TriviaQA dataset.} }
Endnote
%0 Conference Paper %T Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density %A Zhaoye Li %A Siyuan Shen %A Wenjing Yang %A Ruochun Jin %A Huan Chen %A Ligong Cao %A Jing Ren %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-li25b %I PMLR %P 2537--2551 %U https://proceedings.mlr.press/v286/li25b.html %V 286 %X Large Language Models (LLMs) excel in language understanding but are susceptible to "confabulation," where they generate arbitrary, factually incorrect responses to uncertain questions. Detecting confabulation in question answering often relies on Uncertainty Quantification (UQ), which measures semantic entropy or consistency among sampled answers. While several methods have been proposed for UQ in LLMs, they suffer from key limitations, such as overlooking fine-grained semantic relationships among answers and neglecting answer probabilities. To address these issues, we propose Semantic Graph Density (SGD). SGD quantifies semantic consistency by evaluating the density of a semantic graph that captures fine-grained semantic relationships among answers. Additionally, it integrates answer probabilities to adjust the contribution of each edge to the overall uncertainty score. We theoretically prove that SGD generalizes the previous state-of-the-art method, Deg, and empirically demonstrate its superior performance across four LLMs and four free-form question-answering datasets. In particular, in experiments with Llama3.1-8B, SGD outperformed the best baseline by 1.52% in AUROC on the CoQA dataset and by 1.22% in AUARC on the TriviaQA dataset.
APA
Li, Z., Shen, S., Yang, W., Jin, R., Chen, H., Cao, L. & Ren, J.. (2025). Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:2537-2551 Available from https://proceedings.mlr.press/v286/li25b.html.

Related Material