AI Psychiatrist Assistant: An LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews

Adam Greene, Neviah Blair, Samin Mahdipour Aghabagher, Simmi Kumari, Michael W. Schlund, Alex Fedorov, Vince D. Calhoun, Xinhui Li, Rogers F. Silva
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:525-542, 2026.

Abstract

Depression is one of the most common mental disorders yet remains underdiagnosed. Large language models ({LLM}s) have shown promise in their ability to understand the semantic meaning behind medical text and automate clinical workflows through collaborative agents. Here, we propose an {LLM}-based multi-agent system to diagnose depression symptoms from clinical interview transcripts. Our system integrates four agents: (1) a qualitative assessment agent that identifies symptoms and risk factors, (2) a judge agent that evaluates qualitative assessment through iterative self-refinement, (3) a quantitative assessment agent that predicts clinical scores using a novel embedding-based few-shot prompting approach, and (4) a meta-review agent that integrates outputs into a comprehensive overview of a patient’s mental state. The qualitative assessment agent provided coherent, specific, and reasonably accurate assessment, as evaluated by both the human expert and the judge agent. The quantitative assessment agent with few-shot prompting showed an average mean absolute error of 0.619 for symptom prediction versus 0.796 in zero-shot prompting, while the meta-review agent achieved a binary classification accuracy of 78%, comparable to that of a human expert. Our system could serve as a consultant for psychiatrists and psychologists, offering an alternative perspective on patients’ mental health conditions, and thus establishing a foundation for future work on agent-aided clinical support.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-greene26a, title = {{AI} Psychiatrist Assistant: An {LLM}-based Multi-Agent System for Depression Assessment from Clinical Interviews}, author = {Greene, Adam and Blair, Neviah and Mahdipour Aghabagher, Samin and Kumari, Simmi and Schlund, Michael W. and Fedorov, Alex and Calhoun, Vince D. and Li, Xinhui and Silva, Rogers F.}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {525--542}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/greene26a/greene26a.pdf}, url = {https://proceedings.mlr.press/v297/greene26a.html}, abstract = {Depression is one of the most common mental disorders yet remains underdiagnosed. Large language models ({LLM}s) have shown promise in their ability to understand the semantic meaning behind medical text and automate clinical workflows through collaborative agents. Here, we propose an {LLM}-based multi-agent system to diagnose depression symptoms from clinical interview transcripts. Our system integrates four agents: (1) a qualitative assessment agent that identifies symptoms and risk factors, (2) a judge agent that evaluates qualitative assessment through iterative self-refinement, (3) a quantitative assessment agent that predicts clinical scores using a novel embedding-based few-shot prompting approach, and (4) a meta-review agent that integrates outputs into a comprehensive overview of a patient’s mental state. The qualitative assessment agent provided coherent, specific, and reasonably accurate assessment, as evaluated by both the human expert and the judge agent. The quantitative assessment agent with few-shot prompting showed an average mean absolute error of 0.619 for symptom prediction versus 0.796 in zero-shot prompting, while the meta-review agent achieved a binary classification accuracy of 78%, comparable to that of a human expert. Our system could serve as a consultant for psychiatrists and psychologists, offering an alternative perspective on patients’ mental health conditions, and thus establishing a foundation for future work on agent-aided clinical support.} }
Endnote
%0 Conference Paper %T AI Psychiatrist Assistant: An LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews %A Adam Greene %A Neviah Blair %A Samin Mahdipour Aghabagher %A Simmi Kumari %A Michael W. Schlund %A Alex Fedorov %A Vince D. Calhoun %A Xinhui Li %A Rogers F. Silva %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-greene26a %I PMLR %P 525--542 %U https://proceedings.mlr.press/v297/greene26a.html %V 297 %X Depression is one of the most common mental disorders yet remains underdiagnosed. Large language models ({LLM}s) have shown promise in their ability to understand the semantic meaning behind medical text and automate clinical workflows through collaborative agents. Here, we propose an {LLM}-based multi-agent system to diagnose depression symptoms from clinical interview transcripts. Our system integrates four agents: (1) a qualitative assessment agent that identifies symptoms and risk factors, (2) a judge agent that evaluates qualitative assessment through iterative self-refinement, (3) a quantitative assessment agent that predicts clinical scores using a novel embedding-based few-shot prompting approach, and (4) a meta-review agent that integrates outputs into a comprehensive overview of a patient’s mental state. The qualitative assessment agent provided coherent, specific, and reasonably accurate assessment, as evaluated by both the human expert and the judge agent. The quantitative assessment agent with few-shot prompting showed an average mean absolute error of 0.619 for symptom prediction versus 0.796 in zero-shot prompting, while the meta-review agent achieved a binary classification accuracy of 78%, comparable to that of a human expert. Our system could serve as a consultant for psychiatrists and psychologists, offering an alternative perspective on patients’ mental health conditions, and thus establishing a foundation for future work on agent-aided clinical support.
APA
Greene, A., Blair, N., Mahdipour Aghabagher, S., Kumari, S., Schlund, M.W., Fedorov, A., Calhoun, V.D., Li, X. & Silva, R.F.. (2026). AI Psychiatrist Assistant: An LLM-based Multi-Agent System for Depression Assessment from Clinical Interviews. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:525-542 Available from https://proceedings.mlr.press/v297/greene26a.html.

Related Material