ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Juan Sebastián Rodrı́guez Vera, Marta Fay, Imran S Haque, Oren Kraus
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29735-29752, 2025.

Abstract

Deriving insights from experimentally generated datasets requires methods that can account for random and systematic measurement errors and remove them in order to accurately represent the underlying effects of the conditions being tested. Here we present a framework for pretraining on large-scale microscopy datasets that includes three steps: (1) curating a set of diverse and self-consistent training samples, (2) scaling training of an appropriate foundation model architecture on this dataset, (3) evaluating intermediate layers of the trained model to identify the best representation for downstream tasks. Using this strategy, we present the largest foundation model for cell microscopy data to our knowledge, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome relationship recall, batch correction replicate consistency, and compound-gene activity prediction benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kenyon-dean25a, title = {{V}i{T}ally Consistent: Scaling Biological Representation Learning for Cell Microscopy}, author = {Kenyon-Dean, Kian and Wang, Zitong Jerry and Urbanik, John and Donhauser, Konstantin and Hartford, Jason and Saberian, Saber and Sahin, Nil and Bendidi, Ihab and Celik, Safiye and Vera, Juan Sebasti\'{a}n Rodr\'{\i}guez and Fay, Marta and Haque, Imran S and Kraus, Oren}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {29735--29752}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kenyon-dean25a/kenyon-dean25a.pdf}, url = {https://proceedings.mlr.press/v267/kenyon-dean25a.html}, abstract = {Deriving insights from experimentally generated datasets requires methods that can account for random and systematic measurement errors and remove them in order to accurately represent the underlying effects of the conditions being tested. Here we present a framework for pretraining on large-scale microscopy datasets that includes three steps: (1) curating a set of diverse and self-consistent training samples, (2) scaling training of an appropriate foundation model architecture on this dataset, (3) evaluating intermediate layers of the trained model to identify the best representation for downstream tasks. Using this strategy, we present the largest foundation model for cell microscopy data to our knowledge, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome relationship recall, batch correction replicate consistency, and compound-gene activity prediction benchmarks.} }
Endnote
%0 Conference Paper %T ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy %A Kian Kenyon-Dean %A Zitong Jerry Wang %A John Urbanik %A Konstantin Donhauser %A Jason Hartford %A Saber Saberian %A Nil Sahin %A Ihab Bendidi %A Safiye Celik %A Juan Sebastián Rodrı́guez Vera %A Marta Fay %A Imran S Haque %A Oren Kraus %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kenyon-dean25a %I PMLR %P 29735--29752 %U https://proceedings.mlr.press/v267/kenyon-dean25a.html %V 267 %X Deriving insights from experimentally generated datasets requires methods that can account for random and systematic measurement errors and remove them in order to accurately represent the underlying effects of the conditions being tested. Here we present a framework for pretraining on large-scale microscopy datasets that includes three steps: (1) curating a set of diverse and self-consistent training samples, (2) scaling training of an appropriate foundation model architecture on this dataset, (3) evaluating intermediate layers of the trained model to identify the best representation for downstream tasks. Using this strategy, we present the largest foundation model for cell microscopy data to our knowledge, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome relationship recall, batch correction replicate consistency, and compound-gene activity prediction benchmarks.
APA
Kenyon-Dean, K., Wang, Z.J., Urbanik, J., Donhauser, K., Hartford, J., Saberian, S., Sahin, N., Bendidi, I., Celik, S., Vera, J.S.R., Fay, M., Haque, I.S. & Kraus, O.. (2025). ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29735-29752 Available from https://proceedings.mlr.press/v267/kenyon-dean25a.html.

Related Material