EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping

Sam Gijsen, Kerstin Ritter
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19480-19504, 2025.

Abstract

Multimodal language modeling has enabled breakthroughs for representation learning, yet remains unexplored in the realm of functional brain data for clinical phenotyping. This paper pioneers EEG-language models (ELMs) trained on clinical reports and 15000 EEGs. We propose to combine multimodal alignment in this novel domain with timeseries cropping and text segmentation, enabling an extension based on multiple instance learning to alleviate misalignment between irrelevant EEG or text segments. Our multimodal models significantly improve over EEG-only models across four clinical evaluations and for the first time enable zero-shot classification as well as retrieval of both neural signals and reports. In sum, these results highlight the potential of ELMs, representing significant progress for clinical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gijsen25a, title = {{EEG}-Language Pretraining for Highly Label-Efficient Clinical Phenotyping}, author = {Gijsen, Sam and Ritter, Kerstin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {19480--19504}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gijsen25a/gijsen25a.pdf}, url = {https://proceedings.mlr.press/v267/gijsen25a.html}, abstract = {Multimodal language modeling has enabled breakthroughs for representation learning, yet remains unexplored in the realm of functional brain data for clinical phenotyping. This paper pioneers EEG-language models (ELMs) trained on clinical reports and 15000 EEGs. We propose to combine multimodal alignment in this novel domain with timeseries cropping and text segmentation, enabling an extension based on multiple instance learning to alleviate misalignment between irrelevant EEG or text segments. Our multimodal models significantly improve over EEG-only models across four clinical evaluations and for the first time enable zero-shot classification as well as retrieval of both neural signals and reports. In sum, these results highlight the potential of ELMs, representing significant progress for clinical applications.} }
Endnote
%0 Conference Paper %T EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping %A Sam Gijsen %A Kerstin Ritter %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gijsen25a %I PMLR %P 19480--19504 %U https://proceedings.mlr.press/v267/gijsen25a.html %V 267 %X Multimodal language modeling has enabled breakthroughs for representation learning, yet remains unexplored in the realm of functional brain data for clinical phenotyping. This paper pioneers EEG-language models (ELMs) trained on clinical reports and 15000 EEGs. We propose to combine multimodal alignment in this novel domain with timeseries cropping and text segmentation, enabling an extension based on multiple instance learning to alleviate misalignment between irrelevant EEG or text segments. Our multimodal models significantly improve over EEG-only models across four clinical evaluations and for the first time enable zero-shot classification as well as retrieval of both neural signals and reports. In sum, these results highlight the potential of ELMs, representing significant progress for clinical applications.
APA
Gijsen, S. & Ritter, K.. (2025). EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:19480-19504 Available from https://proceedings.mlr.press/v267/gijsen25a.html.

Related Material