Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies

Andrei Dmitrenko, Mauro M. Masiero, Nicola Zamboni
Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:353-369, 2022.

Abstract

Recent advances in computer vision and robotics enabled automated large-scale biological image analysis. Various machine learning approaches have been successfully applied to phenotypic profiling. However, it remains unclear how they compare in terms of biological feature extraction. In this study, we propose a simple CNN architecture and implement 4 different representation learning approaches. We train 16 deep learning setups on the 770k cancer cell images dataset under identical conditions, using different augmenting and cropping strategies. We compare the learned representations by evaluating multiple metrics for each of three downstream tasks: i) distance-based similarity analysis of known drugs, ii) classification of drugs versus controls, iii) clustering within cell lines. We also compare training times and memory usage. Among all tested setups, multi-crops and random augmentations generally improved performance across tasks, as expected. Strikingly, self-supervised (implicit contrastive learning) models showed competitive performance being up to 11 times faster to train. Self-supervised regularized learning required the most of memory and computation to deliver arguably the most informative features. We observe that no single combination of augmenting and cropping strategies consistently results in top performance across tasks and recommend prospective research directions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v172-dmitrenko22a, title = {Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies}, author = {Dmitrenko, Andrei and Masiero, Mauro M. and Zamboni, Nicola}, booktitle = {Proceedings of The 5th International Conference on Medical Imaging with Deep Learning}, pages = {353--369}, year = {2022}, editor = {Konukoglu, Ender and Menze, Bjoern and Venkataraman, Archana and Baumgartner, Christian and Dou, Qi and Albarqouni, Shadi}, volume = {172}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v172/dmitrenko22a/dmitrenko22a.pdf}, url = {https://proceedings.mlr.press/v172/dmitrenko22a.html}, abstract = {Recent advances in computer vision and robotics enabled automated large-scale biological image analysis. Various machine learning approaches have been successfully applied to phenotypic profiling. However, it remains unclear how they compare in terms of biological feature extraction. In this study, we propose a simple CNN architecture and implement 4 different representation learning approaches. We train 16 deep learning setups on the 770k cancer cell images dataset under identical conditions, using different augmenting and cropping strategies. We compare the learned representations by evaluating multiple metrics for each of three downstream tasks: i) distance-based similarity analysis of known drugs, ii) classification of drugs versus controls, iii) clustering within cell lines. We also compare training times and memory usage. Among all tested setups, multi-crops and random augmentations generally improved performance across tasks, as expected. Strikingly, self-supervised (implicit contrastive learning) models showed competitive performance being up to 11 times faster to train. Self-supervised regularized learning required the most of memory and computation to deliver arguably the most informative features. We observe that no single combination of augmenting and cropping strategies consistently results in top performance across tasks and recommend prospective research directions.} }
Endnote
%0 Conference Paper %T Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies %A Andrei Dmitrenko %A Mauro M. Masiero %A Nicola Zamboni %B Proceedings of The 5th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2022 %E Ender Konukoglu %E Bjoern Menze %E Archana Venkataraman %E Christian Baumgartner %E Qi Dou %E Shadi Albarqouni %F pmlr-v172-dmitrenko22a %I PMLR %P 353--369 %U https://proceedings.mlr.press/v172/dmitrenko22a.html %V 172 %X Recent advances in computer vision and robotics enabled automated large-scale biological image analysis. Various machine learning approaches have been successfully applied to phenotypic profiling. However, it remains unclear how they compare in terms of biological feature extraction. In this study, we propose a simple CNN architecture and implement 4 different representation learning approaches. We train 16 deep learning setups on the 770k cancer cell images dataset under identical conditions, using different augmenting and cropping strategies. We compare the learned representations by evaluating multiple metrics for each of three downstream tasks: i) distance-based similarity analysis of known drugs, ii) classification of drugs versus controls, iii) clustering within cell lines. We also compare training times and memory usage. Among all tested setups, multi-crops and random augmentations generally improved performance across tasks, as expected. Strikingly, self-supervised (implicit contrastive learning) models showed competitive performance being up to 11 times faster to train. Self-supervised regularized learning required the most of memory and computation to deliver arguably the most informative features. We observe that no single combination of augmenting and cropping strategies consistently results in top performance across tasks and recommend prospective research directions.
APA
Dmitrenko, A., Masiero, M.M. & Zamboni, N.. (2022). Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies. Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 172:353-369 Available from https://proceedings.mlr.press/v172/dmitrenko22a.html.

Related Material