Labrador: Exploring the limits of masked language modeling for laboratory data

David Bellamy, Bhawesh Kumar, Cindy Wang, Andrew Beam
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:104-129, 2025.

Abstract

In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-bellamy25a, title = {Labrador: Exploring the limits of masked language modeling for laboratory data}, author = {Bellamy, David and Kumar, Bhawesh and Wang, Cindy and Beam, Andrew}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {104--129}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/bellamy25a/bellamy25a.pdf}, url = {https://proceedings.mlr.press/v259/bellamy25a.html}, abstract = {In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.} }
Endnote
%0 Conference Paper %T Labrador: Exploring the limits of masked language modeling for laboratory data %A David Bellamy %A Bhawesh Kumar %A Cindy Wang %A Andrew Beam %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-bellamy25a %I PMLR %P 104--129 %U https://proceedings.mlr.press/v259/bellamy25a.html %V 259 %X In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.
APA
Bellamy, D., Kumar, B., Wang, C. & Beam, A.. (2025). Labrador: Exploring the limits of masked language modeling for laboratory data. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:104-129 Available from https://proceedings.mlr.press/v259/bellamy25a.html.

Related Material