Medical diffusion on a budget: Textual Inversion for medical image generation

Bram de Wilde, Anindo Saha, Maarten de Rooij, Henkjan Huisman, Geert Litjens
Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, PMLR 250:1687-1706, 2024.

Abstract

Diffusion models for text-to-image generation, known for their efficiency, accessibility, and quality, have gained popularity. While inference with these systems on consumer-grade GPUs is increasingly feasible, training from scratch requires large captioned datasets and significant computational resources. In medical image generation, limited availability of large, publicly accessible datasets with text reports poses challenges due to legal and ethical concerns. This work shows that adapting pre-trained Stable Diffusion models to medical imaging modalities is achievable by training text embeddings using Textual Inversion.In this study, we experimented with small medical datasets (100 samples each from three modalities) and trained within hours to generate diagnostically accurate images, as judged by an expert radiologist. Experiments with Textual Inversion training and inference parameters reveal the necessity of larger embeddings and more examples in the medical domain. Classification experiments show an increase in diagnostic accuracy (AUC) for detecting prostate cancer on MRI, from 0.78 to 0.80. Further experiments demonstrate embedding flexibility through disease interpolation, combining pathologies, and inpainting for precise disease appearance control. Notably, the trained embeddings are compact (less than 1 MB), enabling easy data sharing with reduced privacy concerns.

Cite this Paper


BibTeX
@InProceedings{pmlr-v250-wilde24a, title = {Medical diffusion on a budget: Textual Inversion for medical image generation}, author = {de Wilde, Bram and Saha, Anindo and de Rooij, Maarten and Huisman, Henkjan and Litjens, Geert}, booktitle = {Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning}, pages = {1687--1706}, year = {2024}, editor = {Burgos, Ninon and Petitjean, Caroline and Vakalopoulou, Maria and Christodoulidis, Stergios and Coupe, Pierrick and Delingette, Hervé and Lartizien, Carole and Mateus, Diana}, volume = {250}, series = {Proceedings of Machine Learning Research}, month = {03--05 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v250/main/assets/wilde24a/wilde24a.pdf}, url = {https://proceedings.mlr.press/v250/wilde24a.html}, abstract = {Diffusion models for text-to-image generation, known for their efficiency, accessibility, and quality, have gained popularity. While inference with these systems on consumer-grade GPUs is increasingly feasible, training from scratch requires large captioned datasets and significant computational resources. In medical image generation, limited availability of large, publicly accessible datasets with text reports poses challenges due to legal and ethical concerns. This work shows that adapting pre-trained Stable Diffusion models to medical imaging modalities is achievable by training text embeddings using Textual Inversion.In this study, we experimented with small medical datasets (100 samples each from three modalities) and trained within hours to generate diagnostically accurate images, as judged by an expert radiologist. Experiments with Textual Inversion training and inference parameters reveal the necessity of larger embeddings and more examples in the medical domain. Classification experiments show an increase in diagnostic accuracy (AUC) for detecting prostate cancer on MRI, from 0.78 to 0.80. Further experiments demonstrate embedding flexibility through disease interpolation, combining pathologies, and inpainting for precise disease appearance control. Notably, the trained embeddings are compact (less than 1 MB), enabling easy data sharing with reduced privacy concerns.} }
Endnote
%0 Conference Paper %T Medical diffusion on a budget: Textual Inversion for medical image generation %A Bram de Wilde %A Anindo Saha %A Maarten de Rooij %A Henkjan Huisman %A Geert Litjens %B Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ninon Burgos %E Caroline Petitjean %E Maria Vakalopoulou %E Stergios Christodoulidis %E Pierrick Coupe %E Hervé Delingette %E Carole Lartizien %E Diana Mateus %F pmlr-v250-wilde24a %I PMLR %P 1687--1706 %U https://proceedings.mlr.press/v250/wilde24a.html %V 250 %X Diffusion models for text-to-image generation, known for their efficiency, accessibility, and quality, have gained popularity. While inference with these systems on consumer-grade GPUs is increasingly feasible, training from scratch requires large captioned datasets and significant computational resources. In medical image generation, limited availability of large, publicly accessible datasets with text reports poses challenges due to legal and ethical concerns. This work shows that adapting pre-trained Stable Diffusion models to medical imaging modalities is achievable by training text embeddings using Textual Inversion.In this study, we experimented with small medical datasets (100 samples each from three modalities) and trained within hours to generate diagnostically accurate images, as judged by an expert radiologist. Experiments with Textual Inversion training and inference parameters reveal the necessity of larger embeddings and more examples in the medical domain. Classification experiments show an increase in diagnostic accuracy (AUC) for detecting prostate cancer on MRI, from 0.78 to 0.80. Further experiments demonstrate embedding flexibility through disease interpolation, combining pathologies, and inpainting for precise disease appearance control. Notably, the trained embeddings are compact (less than 1 MB), enabling easy data sharing with reduced privacy concerns.
APA
de Wilde, B., Saha, A., de Rooij, M., Huisman, H. & Litjens, G.. (2024). Medical diffusion on a budget: Textual Inversion for medical image generation. Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 250:1687-1706 Available from https://proceedings.mlr.press/v250/wilde24a.html.

Related Material