[edit]
Evaluation of Multi-Agent LLMs in Multidisciplinary Team Decision-Making for Challenging Cancer Cases
Proceedings of the 10th Machine Learning for Healthcare Conference, PMLR 298, 2025.
Abstract
This study explores the potential of large language model (LLM) agents in real-world clinical decision-making, focusing on their alignment with human experts in cancer multidisciplinary team (MDT) meetings. While LLMs perform well on benchmark medical question-answering tasks, these evaluations often oversimplify the open-ended, multifaceted nature of actual clinical decisions. In practice, MDTs require balancing diverse expert opinions and multiple valid treatment options. Using real MDT meeting data, we compare different LLM approaches including single-agent and multi-agent systems to assess their ability to replicate consensus-based decisions. Our findings indicate that multi-agent, conversation-based systems, which assign specialized roles and facilitate dynamic inter-agent conversation, better approximate human expert decisions in our data. Overall, this work highlights the potential practical utility of LLM agents in complex clinical settings and lays the groundwork for their future integration as decision support tools in multidisciplinary medical contexts.