Citation Constraints and Reference Hallucinations in Large Language Models

Kimberly Davis, Qusay Mahmoud
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:464-475, 2026.

Abstract

This paper investigates reference hallucinations in large language models (LLMs) under different prompting constraints. Thirty-six academic-style documents were generated across four systems: Gemini 3, ChatGPT 5.1, ChatGPT 4o, and Microsoft 365 Copilot, and evaluated using an automated citation verification method that cross-checks references against Crossref, OpenAlex, and arXiv. The results show that stricter citation requirements are associated with higher rates of invalid or inconsistent references, whereas unconstrained prompts more frequently produce unsupported conceptual claims rather than fabricated citations. These findings indicate that hallucination behaviour depends on task structure rather than simply topic difficulty, highlighting the importance of prompt design and verification when LLMs are used for research-style writing and literature assistance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-davis26a, title = {Citation Constraints and Reference Hallucinations in Large Language Models}, author = {Davis, Kimberly and Mahmoud, Qusay}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {464--475}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/davis26a/davis26a.pdf}, url = {https://proceedings.mlr.press/v318/davis26a.html}, abstract = {This paper investigates reference hallucinations in large language models (LLMs) under different prompting constraints. Thirty-six academic-style documents were generated across four systems: Gemini 3, ChatGPT 5.1, ChatGPT 4o, and Microsoft 365 Copilot, and evaluated using an automated citation verification method that cross-checks references against Crossref, OpenAlex, and arXiv. The results show that stricter citation requirements are associated with higher rates of invalid or inconsistent references, whereas unconstrained prompts more frequently produce unsupported conceptual claims rather than fabricated citations. These findings indicate that hallucination behaviour depends on task structure rather than simply topic difficulty, highlighting the importance of prompt design and verification when LLMs are used for research-style writing and literature assistance.} }
Endnote
%0 Conference Paper %T Citation Constraints and Reference Hallucinations in Large Language Models %A Kimberly Davis %A Qusay Mahmoud %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-davis26a %I PMLR %P 464--475 %U https://proceedings.mlr.press/v318/davis26a.html %V 318 %X This paper investigates reference hallucinations in large language models (LLMs) under different prompting constraints. Thirty-six academic-style documents were generated across four systems: Gemini 3, ChatGPT 5.1, ChatGPT 4o, and Microsoft 365 Copilot, and evaluated using an automated citation verification method that cross-checks references against Crossref, OpenAlex, and arXiv. The results show that stricter citation requirements are associated with higher rates of invalid or inconsistent references, whereas unconstrained prompts more frequently produce unsupported conceptual claims rather than fabricated citations. These findings indicate that hallucination behaviour depends on task structure rather than simply topic difficulty, highlighting the importance of prompt design and verification when LLMs are used for research-style writing and literature assistance.
APA
Davis, K. & Mahmoud, Q.. (2026). Citation Constraints and Reference Hallucinations in Large Language Models. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:464-475 Available from https://proceedings.mlr.press/v318/davis26a.html.

Related Material