ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology

Tan H. Nguyen, Dinkar Juyal, Jin Li, Aaditya Prakash, Shima Nofallah, Chintan Shah, Sai Chowdary Gullapally, Limin Yu, Michael Griffin, Anand Sampat, John Abel, Justin Lee, Amaro Taylor-Weiner
Proceedings of the MICCAI Workshop on Computational Pathology, PMLR 254:121-130, 2024.

Abstract

Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. ContriMix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. Our source code and pre-trained checkpoints are available at https://gitlab.com/huutan86 intraminibatch_permutation_drit.

Cite this Paper


BibTeX
@InProceedings{pmlr-v254-nguyen24a, title = {ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology}, author = {Nguyen, Tan H. and Juyal, Dinkar and Li, Jin and Prakash, Aaditya and Nofallah, Shima and Shah, Chintan and Gullapally, Sai Chowdary and Yu, Limin and Griffin, Michael and Sampat, Anand and Abel, John and Lee, Justin and Taylor-Weiner, Amaro}, booktitle = {Proceedings of the MICCAI Workshop on Computational Pathology}, pages = {121--130}, year = {2024}, editor = {Ciompi, Francesco and Khalili, Nadieh and Studer, Linda and Poceviciute, Milda and Khan, Amjad and Veta, Mitko and Jiao, Yiping and Haj-Hosseini, Neda and Chen, Hao and Raza, Shan and Minhas, FayyazZlobec, Inti and Burlutskiy, Nikolay and Vilaplana, Veronica and Brattoli, Biagio and Muller, Henning and Atzori, Manfredo and Raza, Shan and Minhas, Fayyaz}, volume = {254}, series = {Proceedings of Machine Learning Research}, month = {06 Oct}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v254/main/assets/nguyen24a/nguyen24a.pdf}, url = {https://proceedings.mlr.press/v254/nguyen24a.html}, abstract = {Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. ContriMix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. Our source code and pre-trained checkpoints are available at https://gitlab.com/huutan86 intraminibatch_permutation_drit.} }
Endnote
%0 Conference Paper %T ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology %A Tan H. Nguyen %A Dinkar Juyal %A Jin Li %A Aaditya Prakash %A Shima Nofallah %A Chintan Shah %A Sai Chowdary Gullapally %A Limin Yu %A Michael Griffin %A Anand Sampat %A John Abel %A Justin Lee %A Amaro Taylor-Weiner %B Proceedings of the MICCAI Workshop on Computational Pathology %C Proceedings of Machine Learning Research %D 2024 %E Francesco Ciompi %E Nadieh Khalili %E Linda Studer %E Milda Poceviciute %E Amjad Khan %E Mitko Veta %E Yiping Jiao %E Neda Haj-Hosseini %E Hao Chen %E Shan Raza %E Fayyaz MinhasInti Zlobec %E Nikolay Burlutskiy %E Veronica Vilaplana %E Biagio Brattoli %E Henning Muller %E Manfredo Atzori %E Shan Raza %E Fayyaz Minhas %F pmlr-v254-nguyen24a %I PMLR %P 121--130 %U https://proceedings.mlr.press/v254/nguyen24a.html %V 254 %X Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. ContriMix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. Our source code and pre-trained checkpoints are available at https://gitlab.com/huutan86 intraminibatch_permutation_drit.
APA
Nguyen, T.H., Juyal, D., Li, J., Prakash, A., Nofallah, S., Shah, C., Gullapally, S.C., Yu, L., Griffin, M., Sampat, A., Abel, J., Lee, J. & Taylor-Weiner, A.. (2024). ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology. Proceedings of the MICCAI Workshop on Computational Pathology, in Proceedings of Machine Learning Research 254:121-130 Available from https://proceedings.mlr.press/v254/nguyen24a.html.

Related Material