Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach

Ardvan Saeedi, Payman Yadollahpour, Sumedha Singla, Brian Pollack, William Wells, Frank Sciurba, Kayhan Batmanghelich
Proceedings of the 6th Machine Learning for Healthcare Conference, PMLR 149:478-505, 2021.

Abstract

Probabilistic topic models, have been widely deployed for various applications such as learning disease or tissue subtypes. Yet, learning the parameters of such models is usually an ill-posed problem and may result in losing valuable information about disease severity. A common approach is to add a discriminative loss term to the generative model’s loss in order to learn a representation that is also predictive of disease severity. However, finding a balance between these two losses is not straightforward. We propose an alternative way in this paper. We develop a framework which allows for incorporating external covariates into the generative model’s approximate posterior. These covariates can have more discriminative power for disease severity compared to the representation that we extract from the posterior distribution. For instance, they can be features extracted from a neural network which predicts disease severity from CT images. Effectively, we enforce the generative model’s approximate posterior to reside in the subspace of these discriminative covariates. We illustrate our method’s application on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD), a highly heterogeneous disease. We aim at identifying tissue subtypes by using a variant of topic model as a generative model. We quantitatively evaluate the predictive performance of the inferred subtypes and demonstrate that our method outperforms or performs on par with some reasonable baselines. We also show that some of the discovered subtypes are correlated with genetic measurements, suggesting that the identified subtypes may characterize the disease’s underlying etiology.

Cite this Paper


BibTeX
@InProceedings{pmlr-v149-saeedi21a, title = {Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach}, author = {Saeedi, Ardvan and Yadollahpour, Payman and Singla, Sumedha and Pollack, Brian and Wells, William and Sciurba, Frank and Batmanghelich, Kayhan}, booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference}, pages = {478--505}, year = {2021}, editor = {Jung, Ken and Yeung, Serena and Sendak, Mark and Sjoding, Michael and Ranganath, Rajesh}, volume = {149}, series = {Proceedings of Machine Learning Research}, month = {06--07 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v149/saeedi21a/saeedi21a.pdf}, url = {https://proceedings.mlr.press/v149/saeedi21a.html}, abstract = {Probabilistic topic models, have been widely deployed for various applications such as learning disease or tissue subtypes. Yet, learning the parameters of such models is usually an ill-posed problem and may result in losing valuable information about disease severity. A common approach is to add a discriminative loss term to the generative model’s loss in order to learn a representation that is also predictive of disease severity. However, finding a balance between these two losses is not straightforward. We propose an alternative way in this paper. We develop a framework which allows for incorporating external covariates into the generative model’s approximate posterior. These covariates can have more discriminative power for disease severity compared to the representation that we extract from the posterior distribution. For instance, they can be features extracted from a neural network which predicts disease severity from CT images. Effectively, we enforce the generative model’s approximate posterior to reside in the subspace of these discriminative covariates. We illustrate our method’s application on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD), a highly heterogeneous disease. We aim at identifying tissue subtypes by using a variant of topic model as a generative model. We quantitatively evaluate the predictive performance of the inferred subtypes and demonstrate that our method outperforms or performs on par with some reasonable baselines. We also show that some of the discovered subtypes are correlated with genetic measurements, suggesting that the identified subtypes may characterize the disease’s underlying etiology.} }
Endnote
%0 Conference Paper %T Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach %A Ardvan Saeedi %A Payman Yadollahpour %A Sumedha Singla %A Brian Pollack %A William Wells %A Frank Sciurba %A Kayhan Batmanghelich %B Proceedings of the 6th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2021 %E Ken Jung %E Serena Yeung %E Mark Sendak %E Michael Sjoding %E Rajesh Ranganath %F pmlr-v149-saeedi21a %I PMLR %P 478--505 %U https://proceedings.mlr.press/v149/saeedi21a.html %V 149 %X Probabilistic topic models, have been widely deployed for various applications such as learning disease or tissue subtypes. Yet, learning the parameters of such models is usually an ill-posed problem and may result in losing valuable information about disease severity. A common approach is to add a discriminative loss term to the generative model’s loss in order to learn a representation that is also predictive of disease severity. However, finding a balance between these two losses is not straightforward. We propose an alternative way in this paper. We develop a framework which allows for incorporating external covariates into the generative model’s approximate posterior. These covariates can have more discriminative power for disease severity compared to the representation that we extract from the posterior distribution. For instance, they can be features extracted from a neural network which predicts disease severity from CT images. Effectively, we enforce the generative model’s approximate posterior to reside in the subspace of these discriminative covariates. We illustrate our method’s application on a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD), a highly heterogeneous disease. We aim at identifying tissue subtypes by using a variant of topic model as a generative model. We quantitatively evaluate the predictive performance of the inferred subtypes and demonstrate that our method outperforms or performs on par with some reasonable baselines. We also show that some of the discovered subtypes are correlated with genetic measurements, suggesting that the identified subtypes may characterize the disease’s underlying etiology.
APA
Saeedi, A., Yadollahpour, P., Singla, S., Pollack, B., Wells, W., Sciurba, F. & Batmanghelich, K.. (2021). Incorporating External Information in Tissue Subtyping: A Topic Modeling Approach. Proceedings of the 6th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 149:478-505 Available from https://proceedings.mlr.press/v149/saeedi21a.html.

Related Material