Can AI Assistants Know What They Don’t Know?

Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:8184-8202, 2024.

Abstract

AI assistants powered by Large Language Models (LLMs) have demonstrated impressive performance in various tasks. However, LLMs still make factual errors in knowledge-intensive tasks such as open-domain question answering. These untruthful responses from AI assistants can pose significant risks in practical applications. Therefore, in this paper, we ask the question Can AI assistants know what they don’t know and express this awareness through natural language? To investigate this, we construct a model-specific "I don’t know" (Idk) dataset. This dataset includes Supervised Fine-tuning data and preference data, categorizing questions based on whether the assistant knows or does not know the answers. Then, we align the assistant with its corresponding Idk dataset using different alignment methods, including Supervised Fine-tuning and preference optimization. Experimental results show that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope. The assistant aligned with the Idk dataset shows significantly higher truthfulness than the original assistant.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-cheng24i, title = {Can {AI} Assistants Know What They Don’t Know?}, author = {Cheng, Qinyuan and Sun, Tianxiang and Liu, Xiangyang and Zhang, Wenwei and Yin, Zhangyue and Li, Shimin and Li, Linyang and He, Zhengfu and Chen, Kai and Qiu, Xipeng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {8184--8202}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24i/cheng24i.pdf}, url = {https://proceedings.mlr.press/v235/cheng24i.html}, abstract = {AI assistants powered by Large Language Models (LLMs) have demonstrated impressive performance in various tasks. However, LLMs still make factual errors in knowledge-intensive tasks such as open-domain question answering. These untruthful responses from AI assistants can pose significant risks in practical applications. Therefore, in this paper, we ask the question Can AI assistants know what they don’t know and express this awareness through natural language? To investigate this, we construct a model-specific "I don’t know" (Idk) dataset. This dataset includes Supervised Fine-tuning data and preference data, categorizing questions based on whether the assistant knows or does not know the answers. Then, we align the assistant with its corresponding Idk dataset using different alignment methods, including Supervised Fine-tuning and preference optimization. Experimental results show that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope. The assistant aligned with the Idk dataset shows significantly higher truthfulness than the original assistant.} }
Endnote
%0 Conference Paper %T Can AI Assistants Know What They Don’t Know? %A Qinyuan Cheng %A Tianxiang Sun %A Xiangyang Liu %A Wenwei Zhang %A Zhangyue Yin %A Shimin Li %A Linyang Li %A Zhengfu He %A Kai Chen %A Xipeng Qiu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-cheng24i %I PMLR %P 8184--8202 %U https://proceedings.mlr.press/v235/cheng24i.html %V 235 %X AI assistants powered by Large Language Models (LLMs) have demonstrated impressive performance in various tasks. However, LLMs still make factual errors in knowledge-intensive tasks such as open-domain question answering. These untruthful responses from AI assistants can pose significant risks in practical applications. Therefore, in this paper, we ask the question Can AI assistants know what they don’t know and express this awareness through natural language? To investigate this, we construct a model-specific "I don’t know" (Idk) dataset. This dataset includes Supervised Fine-tuning data and preference data, categorizing questions based on whether the assistant knows or does not know the answers. Then, we align the assistant with its corresponding Idk dataset using different alignment methods, including Supervised Fine-tuning and preference optimization. Experimental results show that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope. The assistant aligned with the Idk dataset shows significantly higher truthfulness than the original assistant.
APA
Cheng, Q., Sun, T., Liu, X., Zhang, W., Yin, Z., Li, S., Li, L., He, Z., Chen, K. & Qiu, X.. (2024). Can AI Assistants Know What They Don’t Know?. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:8184-8202 Available from https://proceedings.mlr.press/v235/cheng24i.html.

Related Material