End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations

Gregory Gundersen, Bianca Dumitrascu, Jordan T. Ash, Barbara E. Engelhardt
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:945-955, 2020.

Abstract

Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-gundersen20a, title = {End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations}, author = {Gundersen, Gregory and Dumitrascu, Bianca and Ash, Jordan T. and Engelhardt, Barbara E.}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {945--955}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/gundersen20a/gundersen20a.pdf}, url = {https://proceedings.mlr.press/v115/gundersen20a.html}, abstract = {Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.} }
Endnote
%0 Conference Paper %T End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations %A Gregory Gundersen %A Bianca Dumitrascu %A Jordan T. Ash %A Barbara E. Engelhardt %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-gundersen20a %I PMLR %P 945--955 %U https://proceedings.mlr.press/v115/gundersen20a.html %V 115 %X Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.
APA
Gundersen, G., Dumitrascu, B., Ash, J.T. & Engelhardt, B.E.. (2020). End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:945-955 Available from https://proceedings.mlr.press/v115/gundersen20a.html.

Related Material