Evaluation of Multi-Agent LLMs in Multidisciplinary Team Decision-Making for Challenging Cancer Cases

Jaesik Kim, Byounghan Lee, Kyung-Ah Sohn, Dokyoon Kim, Young Chan Lee
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.

Abstract

This study explores the potential of large language model (LLM) agents in real-world clinical decision-making, focusing on their alignment with human experts in cancer multidisciplinary team (MDT) meetings. While LLMs perform well on benchmark medical question-answering tasks, these evaluations often oversimplify the open-ended, multifaceted nature of actual clinical decisions. In practice, MDTs require balancing diverse expert opinions and multiple valid treatment options. Using real MDT meeting data, we compare different LLM approaches including single-agent and multi-agent systems to assess their ability to replicate consensus-based decisions. Our findings indicate that multi-agent, conversation-based systems, which assign specialized roles and facilitate dynamic inter-agent conversation, better approximate human expert decisions in our data. Overall, this work highlights the potential practical utility of LLM agents in complex clinical settings and lays the groundwork for their future integration as decision support tools in multidisciplinary medical contexts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v298-kim25a, title = {Evaluation of Multi-Agent {LLM}s in Multidisciplinary Team Decision-Making for Challenging Cancer Cases}, author = {Kim, Jaesik and Lee, Byounghan and Sohn, Kyung-Ah and Kim, Dokyoon and Lee, Young Chan}, booktitle = {Proceedings of the 10th Machine Learning for Healthcare Conference}, year = {2025}, editor = {Agrawal, Monica and Deshpande, Kaivalya and Engelhard, Matthew and Joshi, Shalmali and Tang, Shengpu and Urteaga, Iñigo}, volume = {298}, series = {Proceedings of Machine Learning Research}, month = {15--16 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v298/main/assets/kim25a/kim25a.pdf}, url = {https://proceedings.mlr.press/v298/kim25a.html}, abstract = {This study explores the potential of large language model (LLM) agents in real-world clinical decision-making, focusing on their alignment with human experts in cancer multidisciplinary team (MDT) meetings. While LLMs perform well on benchmark medical question-answering tasks, these evaluations often oversimplify the open-ended, multifaceted nature of actual clinical decisions. In practice, MDTs require balancing diverse expert opinions and multiple valid treatment options. Using real MDT meeting data, we compare different LLM approaches including single-agent and multi-agent systems to assess their ability to replicate consensus-based decisions. Our findings indicate that multi-agent, conversation-based systems, which assign specialized roles and facilitate dynamic inter-agent conversation, better approximate human expert decisions in our data. Overall, this work highlights the potential practical utility of LLM agents in complex clinical settings and lays the groundwork for their future integration as decision support tools in multidisciplinary medical contexts.} }
Endnote
%0 Conference Paper %T Evaluation of Multi-Agent LLMs in Multidisciplinary Team Decision-Making for Challenging Cancer Cases %A Jaesik Kim %A Byounghan Lee %A Kyung-Ah Sohn %A Dokyoon Kim %A Young Chan Lee %B Proceedings of the 10th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2025 %E Monica Agrawal %E Kaivalya Deshpande %E Matthew Engelhard %E Shalmali Joshi %E Shengpu Tang %E Iñigo Urteaga %F pmlr-v298-kim25a %I PMLR %U https://proceedings.mlr.press/v298/kim25a.html %V 298 %X This study explores the potential of large language model (LLM) agents in real-world clinical decision-making, focusing on their alignment with human experts in cancer multidisciplinary team (MDT) meetings. While LLMs perform well on benchmark medical question-answering tasks, these evaluations often oversimplify the open-ended, multifaceted nature of actual clinical decisions. In practice, MDTs require balancing diverse expert opinions and multiple valid treatment options. Using real MDT meeting data, we compare different LLM approaches including single-agent and multi-agent systems to assess their ability to replicate consensus-based decisions. Our findings indicate that multi-agent, conversation-based systems, which assign specialized roles and facilitate dynamic inter-agent conversation, better approximate human expert decisions in our data. Overall, this work highlights the potential practical utility of LLM agents in complex clinical settings and lays the groundwork for their future integration as decision support tools in multidisciplinary medical contexts.
APA
Kim, J., Lee, B., Sohn, K., Kim, D. & Lee, Y.C.. (2025). Evaluation of Multi-Agent LLMs in Multidisciplinary Team Decision-Making for Challenging Cancer Cases. Proceedings of the 10th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 298 Available from https://proceedings.mlr.press/v298/kim25a.html.

Related Material