Auditing Citation Behavior in AI-Generated Search Summaries: A Framework and a Case Study of Google AI Overviews

Rustem Kakimov, Xing Tan, Jonathan Gillham, Narcis Bejtic
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:224-235, 2026.

Abstract

Search engines increasingly integrate Large Language Models (LLM) to generate natural-language summaries with cited sources, while a growing fraction of online content is partially or fully AI-generated. This convergence raises new questions about how generative search systems select citation sources, particularly with respect to document provenance. In this paper, we propose a system-agnostic observational framework for auditing citation behavior in AI-generated search summaries, modeling retrieval and citation as observable processes over query-document pairs and introducing rank- and provenance-conditioned citation measures. We instantiate the framework in a large-scale empirical study of Google AI Overviews on "Your Money or Your Life" queries drawn from the MS MARCO Web Search dataset. Our analysis shows that AI-generated documents are cited more frequently than human-authored documents even after controlling for retrieval rank, with the difference driven primarily by non-retrieved citations and most pronounced at highly ranked positions. These results highlight the importance of transparent, measurement-based auditing for understanding citation behavior in generative search systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-kakimov26a, title = {Auditing Citation Behavior in AI-Generated Search Summaries: A Framework and a Case Study of Google AI Overviews}, author = {Kakimov, Rustem and Tan, Xing and Gillham, Jonathan and Bejtic, Narcis}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {224--235}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/kakimov26a/kakimov26a.pdf}, url = {https://proceedings.mlr.press/v318/kakimov26a.html}, abstract = {Search engines increasingly integrate Large Language Models (LLM) to generate natural-language summaries with cited sources, while a growing fraction of online content is partially or fully AI-generated. This convergence raises new questions about how generative search systems select citation sources, particularly with respect to document provenance. In this paper, we propose a system-agnostic observational framework for auditing citation behavior in AI-generated search summaries, modeling retrieval and citation as observable processes over query-document pairs and introducing rank- and provenance-conditioned citation measures. We instantiate the framework in a large-scale empirical study of Google AI Overviews on "Your Money or Your Life" queries drawn from the MS MARCO Web Search dataset. Our analysis shows that AI-generated documents are cited more frequently than human-authored documents even after controlling for retrieval rank, with the difference driven primarily by non-retrieved citations and most pronounced at highly ranked positions. These results highlight the importance of transparent, measurement-based auditing for understanding citation behavior in generative search systems.} }
Endnote
%0 Conference Paper %T Auditing Citation Behavior in AI-Generated Search Summaries: A Framework and a Case Study of Google AI Overviews %A Rustem Kakimov %A Xing Tan %A Jonathan Gillham %A Narcis Bejtic %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-kakimov26a %I PMLR %P 224--235 %U https://proceedings.mlr.press/v318/kakimov26a.html %V 318 %X Search engines increasingly integrate Large Language Models (LLM) to generate natural-language summaries with cited sources, while a growing fraction of online content is partially or fully AI-generated. This convergence raises new questions about how generative search systems select citation sources, particularly with respect to document provenance. In this paper, we propose a system-agnostic observational framework for auditing citation behavior in AI-generated search summaries, modeling retrieval and citation as observable processes over query-document pairs and introducing rank- and provenance-conditioned citation measures. We instantiate the framework in a large-scale empirical study of Google AI Overviews on "Your Money or Your Life" queries drawn from the MS MARCO Web Search dataset. Our analysis shows that AI-generated documents are cited more frequently than human-authored documents even after controlling for retrieval rank, with the difference driven primarily by non-retrieved citations and most pronounced at highly ranked positions. These results highlight the importance of transparent, measurement-based auditing for understanding citation behavior in generative search systems.
APA
Kakimov, R., Tan, X., Gillham, J. & Bejtic, N.. (2026). Auditing Citation Behavior in AI-Generated Search Summaries: A Framework and a Case Study of Google AI Overviews. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:224-235 Available from https://proceedings.mlr.press/v318/kakimov26a.html.

Related Material