The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:27005-27020, 2025.

Abstract

The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects. It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data. These advances unlock the potential for scaling speech decoding models beyond the current frontier.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jayalath25a, title = {The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning}, author = {Jayalath, Dulhan and Landau, Gilad and Shillingford, Brendan and Woolrich, Mark and Parker Jones, Oiwi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {27005--27020}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jayalath25a/jayalath25a.pdf}, url = {https://proceedings.mlr.press/v267/jayalath25a.html}, abstract = {The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects. It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data. These advances unlock the potential for scaling speech decoding models beyond the current frontier.} }
Endnote
%0 Conference Paper %T The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning %A Dulhan Jayalath %A Gilad Landau %A Brendan Shillingford %A Mark Woolrich %A Oiwi Parker Jones %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jayalath25a %I PMLR %P 27005--27020 %U https://proceedings.mlr.press/v267/jayalath25a.html %V 267 %X The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects. It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data. These advances unlock the potential for scaling speech decoding models beyond the current frontier.
APA
Jayalath, D., Landau, G., Shillingford, B., Woolrich, M. & Parker Jones, O.. (2025). The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:27005-27020 Available from https://proceedings.mlr.press/v267/jayalath25a.html.

Related Material