[edit]
Concept-Enhanced Automatic ICD Coding using Large Language Models
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:921-935, 2026.
Abstract
Automatic {ICD} coding is a task which assigns disease or procedure codes to clinical notes from patients’ electronic health record data. Large language models have been explored for this task, but none of the existing approaches have shown stronger performance than traditional deep learning models due to limited ability to model concepts. Existing methods for {ICD} coding often utilize the code descriptions or synonyms to enhance performance. In this paper, we propose to use concepts to expand the label space. Utilizing the hierarchy of {ICD} codes, we construct concepts associated with the codes at different levels, and employ fine-tuned large language models to obtain concept scores, which are then used for code prediction. Experiments conducted on {MIMIC}-{III}-50, and {MIMIC}-{III}-rare50 datasets demonstrate that our models achieve excellent performance and largely outperform previous state-of-the-art models. While the current evaluation is constrained in scope and computational tractability, the results provide strong evidence for the potential of concept-driven {LLM} frameworks to advance automated medical coding.