[edit]
Toward Believable Health & Wellness Conversational Agents: A Post-LLM Turing-like Evaluation Framework (Position Paper)
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:812-816, 2026.
Abstract
Large language model (LLM) conversational agents can be remarkably fluent yet still fail to feel fully “real” to users, especially in multi-session and higher-stakes interactions. This paper argues that the limiting problem is no longer surface language quality but \emph{believability}: the conditions under which an artificial conversational partner is experienced as a coherent social mind rather than a fluent text generator. We frame believability as an empirical limit case and propose an operational criterion of \emph{bounded practical indistinguishability} relative to an interaction envelope defined by a judge population, interaction contexts, and a time horizon. We then outline a “post-LLM Turing-like” evaluation approach that stress-tests modern detection cues using contextual scenario families, longitudinal re-contact, and multi-signal measurement combining human judgments with behavioral metrics. Finally, we instantiate the framework for a health and wellness agent being developed with an \emph{industry partner} (details anonymized), arguing that wellness settings sharply amplify the importance of epistemic calibration, continuity, and boundary management. The goal is not to advocate deceptive deployment, but to make believability mechanistic and measurable so that both capabilities and risks can be assessed with clarity.