Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs

Shmuel Berman, Yuval Kansal, Lydia Liu
Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop, PMLR 273:238-244, 2025.

Abstract

Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.

Cite this Paper


BibTeX
@InProceedings{pmlr-v273-berman25a, title = {Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs}, author = {Berman, Shmuel and Kansal, Yuval and Liu, Lydia}, booktitle = {Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop}, pages = {238--244}, year = {2025}, editor = {Wang, Zichao and Woodhead, Simon and Ananda, Muktha and Mallick, Debshila Basu and Sharpnack, James and Burstein, Jill}, volume = {273}, series = {Proceedings of Machine Learning Research}, month = {03 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v273/main/assets/berman25a/berman25a.pdf}, url = {https://proceedings.mlr.press/v273/berman25a.html}, abstract = {Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.} }
Endnote
%0 Conference Paper %T Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs %A Shmuel Berman %A Yuval Kansal %A Lydia Liu %B Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop %C Proceedings of Machine Learning Research %D 2025 %E Zichao Wang %E Simon Woodhead %E Muktha Ananda %E Debshila Basu Mallick %E James Sharpnack %E Jill Burstein %F pmlr-v273-berman25a %I PMLR %P 238--244 %U https://proceedings.mlr.press/v273/berman25a.html %V 273 %X Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.
APA
Berman, S., Kansal, Y. & Liu, L.. (2025). Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs. Proceedings of the Innovation and Responsibility in AI-Supported Education Workshop, in Proceedings of Machine Learning Research 273:238-244 Available from https://proceedings.mlr.press/v273/berman25a.html.

Related Material