Hierarchical Optimal Transport for Comparing Histopathology Datasets

Anna Yeaton, Rahul G. Krishnan, Rebecca Mieloszyk, David Alvarez-Melis, Grace Huynh
Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:1459-1469, 2022.

Abstract

Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets \emph{similar} to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs. normal prediction setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v172-yeaton22a, title = {Hierarchical Optimal Transport for Comparing Histopathology Datasets}, author = {Yeaton, Anna and Krishnan, Rahul G. and Mieloszyk, Rebecca and Alvarez-Melis, David and Huynh, Grace}, booktitle = {Proceedings of The 5th International Conference on Medical Imaging with Deep Learning}, pages = {1459--1469}, year = {2022}, editor = {Konukoglu, Ender and Menze, Bjoern and Venkataraman, Archana and Baumgartner, Christian and Dou, Qi and Albarqouni, Shadi}, volume = {172}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v172/yeaton22a/yeaton22a.pdf}, url = {https://proceedings.mlr.press/v172/yeaton22a.html}, abstract = {Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets \emph{similar} to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs. normal prediction setting.} }
Endnote
%0 Conference Paper %T Hierarchical Optimal Transport for Comparing Histopathology Datasets %A Anna Yeaton %A Rahul G. Krishnan %A Rebecca Mieloszyk %A David Alvarez-Melis %A Grace Huynh %B Proceedings of The 5th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2022 %E Ender Konukoglu %E Bjoern Menze %E Archana Venkataraman %E Christian Baumgartner %E Qi Dou %E Shadi Albarqouni %F pmlr-v172-yeaton22a %I PMLR %P 1459--1469 %U https://proceedings.mlr.press/v172/yeaton22a.html %V 172 %X Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets \emph{similar} to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs. normal prediction setting.
APA
Yeaton, A., Krishnan, R.G., Mieloszyk, R., Alvarez-Melis, D. & Huynh, G.. (2022). Hierarchical Optimal Transport for Comparing Histopathology Datasets. Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 172:1459-1469 Available from https://proceedings.mlr.press/v172/yeaton22a.html.

Related Material