Learning Under Extreme Label Imbalance in EHRs: A Dependency-Aware Loss for Multi-Label Classification

Iris Szu-Szu Ho; Lars Werne; Konrad Rawlik; Bruce Guthrie; Sohan Seth

Learning Under Extreme Label Imbalance in EHRs: A Dependency-Aware Loss for Multi-Label Classification

Iris Szu-Szu Ho, Lars Werne, Konrad Rawlik, Bruce Guthrie, Sohan Seth

Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:880-904, 2026.

Abstract

Extreme multi-label next-visit diagnosis forecasting from electronic health records is dominated by label sparsity. Each visit contains only a handful of positive ICD-10 codes among thousands of candidates, yet codes are strongly correlated through comorbidity structure. In this regime, standard element-wise objectives (such as focal, and class-balanced loss) often maximize sensitivity at the cost of severe precision degradation, producing clinically impractical alert volumes. We propose an architecture-compatible dependency-aware ranking loss that (i) reweights per-code correctness under severe imbalance, (ii) aggregates errors with rank-based emphasis on the hardest labels, and (iii) regularizes predictions with a learned pairwise dependency term in the output space. Using an EHR Transformer backbone, we evaluate on the CPRD cohort ($V{=}1{,}538$ codes), benchmarking loss functions on 200{,}000 patients and validating scalability up to 3.2 million. The proposed objective shifts the precision–recall trade-off toward fewer false positives while maintaining competitive sensitivity, and preserves overall ranking quality (PRC–AUC comparable to weighted BCE). In addition, it yields an auditable population-level dependency matrix summarizing learned co-occurrence structure. These results suggest that explicit output-space structure can improve the precision–recall trade-off in sparse, high-dimensional next-visit diagnosis prediction from EHRs.

Cite this Paper

BibTeX

@InProceedings{pmlr-v333-ho26a,
  title = 	 {Learning Under Extreme Label Imbalance in EHRs: A Dependency-Aware Loss for Multi-Label Classification},
  author =       {Ho, Iris Szu-Szu and Werne, Lars and Rawlik, Konrad and Guthrie, Bruce and Seth, Sohan},
  booktitle = 	 {Proceedings of the 7th Conference on Health, Inference, and Learning},
  pages = 	 {880--904},
  year = 	 {2026},
  editor = 	 {Healey, Elizabeth and Fries, Jason and Pollard, Tom and Tang, Shengpu and Zink, Anna and Hartvigsen, Tom and Agrawal, Monica and Finlayson, Sam and Glicksberg, Benjamin and Beaulieu-Jones, Brett and Wang, Kai and Fontalvo, Daseyra and Sarker, Tasmie and Chen, Irene and Alsentzer, Emily},
  volume = 	 {333},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--30 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v333/main/assets/ho26a/ho26a.pdf},
  url = 	 {https://proceedings.mlr.press/v333/ho26a.html},
  abstract = 	 {Extreme multi-label next-visit diagnosis forecasting from electronic health records is dominated by label sparsity. Each visit contains only a handful of positive ICD-10 codes among thousands of candidates, yet codes are strongly correlated through comorbidity structure. In this regime, standard element-wise objectives (such as focal, and class-balanced loss) often maximize sensitivity at the cost of severe precision degradation, producing clinically impractical alert volumes. We propose an architecture-compatible dependency-aware ranking loss that (i) reweights per-code correctness under severe imbalance, (ii) aggregates errors with rank-based emphasis on the hardest labels, and (iii) regularizes predictions with a learned pairwise dependency term in the output space. Using an EHR Transformer backbone, we evaluate on the CPRD cohort ($V{=}1{,}538$ codes), benchmarking loss functions on 200{,}000 patients and validating scalability up to 3.2 million. The proposed objective shifts the precision–recall trade-off toward fewer false positives while maintaining competitive sensitivity, and preserves overall ranking quality (PRC–AUC comparable to weighted BCE). In addition, it yields an auditable population-level dependency matrix summarizing learned co-occurrence structure. These results suggest that explicit output-space structure can improve the precision–recall trade-off in sparse, high-dimensional next-visit diagnosis prediction from EHRs.}
}

Endnote

%0 Conference Paper
%T Learning Under Extreme Label Imbalance in EHRs: A Dependency-Aware Loss for Multi-Label Classification
%A Iris Szu-Szu Ho
%A Lars Werne
%A Konrad Rawlik
%A Bruce Guthrie
%A Sohan Seth
%B Proceedings of the 7th Conference on Health, Inference, and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Elizabeth Healey
%E Jason Fries
%E Tom Pollard
%E Shengpu Tang
%E Anna Zink
%E Tom Hartvigsen
%E Monica Agrawal
%E Sam Finlayson
%E Benjamin Glicksberg
%E Brett Beaulieu-Jones
%E Kai Wang
%E Daseyra Fontalvo
%E Tasmie Sarker
%E Irene Chen
%E Emily Alsentzer	
%F pmlr-v333-ho26a
%I PMLR
%P 880--904
%U https://proceedings.mlr.press/v333/ho26a.html
%V 333
%X Extreme multi-label next-visit diagnosis forecasting from electronic health records is dominated by label sparsity. Each visit contains only a handful of positive ICD-10 codes among thousands of candidates, yet codes are strongly correlated through comorbidity structure. In this regime, standard element-wise objectives (such as focal, and class-balanced loss) often maximize sensitivity at the cost of severe precision degradation, producing clinically impractical alert volumes. We propose an architecture-compatible dependency-aware ranking loss that (i) reweights per-code correctness under severe imbalance, (ii) aggregates errors with rank-based emphasis on the hardest labels, and (iii) regularizes predictions with a learned pairwise dependency term in the output space. Using an EHR Transformer backbone, we evaluate on the CPRD cohort ($V{=}1{,}538$ codes), benchmarking loss functions on 200{,}000 patients and validating scalability up to 3.2 million. The proposed objective shifts the precision–recall trade-off toward fewer false positives while maintaining competitive sensitivity, and preserves overall ranking quality (PRC–AUC comparable to weighted BCE). In addition, it yields an auditable population-level dependency matrix summarizing learned co-occurrence structure. These results suggest that explicit output-space structure can improve the precision–recall trade-off in sparse, high-dimensional next-visit diagnosis prediction from EHRs.

APA

Ho, I.S., Werne, L., Rawlik, K., Guthrie, B. & Seth, S.. (2026). Learning Under Extreme Label Imbalance in EHRs: A Dependency-Aware Loss for Multi-Label Classification. Proceedings of the 7th Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 333:880-904 Available from https://proceedings.mlr.press/v333/ho26a.html.

Related Material

Download PDF