LabelG: Consistent Pairwise 3D CT Image and Segmentation Mask Generation via Medical Foundation Models

Lu-Yan Wang, Tzung-Dau Wang, Shang-Hong Lai
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:400-413, 2026.

Abstract

Medical image generation is increasingly used for data augmentation in tasks such as segmentation. However, most existing approaches focus solely on synthesizing high-quality images, while the corresponding segmentation masks are generated separately or may lack structural alignment with the images. To address this limitation, we introduce LabelG, a lightweight module that works with pretrained 3D CT diffusion foundation models to produce paired CT images and segmentation masks in a single sampling pass. LabelG decodes multi-scale latent features using a split-MLP architecture and aggregates predictions via a voting mechanism to yield anatomically coherent image–mask pairs, without requiring ground-truth masks or textual prompts at inference time. Experiments on four CT datasets demonstrate that the generated pairs achieve high visual fidelity and can improve downstream segmentation performance when used to augment limited real data. LabelG offers an efficient and scalable approach for synthesizing paired medical data, helping enhance data efficiency in medical image segmentation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-wang26c, title = {LabelG: Consistent Pairwise 3D CT Image and Segmentation Mask Generation via Medical Foundation Models}, author = {Wang, Lu-Yan and Wang, Tzung-Dau and Lai, Shang-Hong}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {400--413}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/wang26c/wang26c.pdf}, url = {https://proceedings.mlr.press/v315/wang26c.html}, abstract = {Medical image generation is increasingly used for data augmentation in tasks such as segmentation. However, most existing approaches focus solely on synthesizing high-quality images, while the corresponding segmentation masks are generated separately or may lack structural alignment with the images. To address this limitation, we introduce LabelG, a lightweight module that works with pretrained 3D CT diffusion foundation models to produce paired CT images and segmentation masks in a single sampling pass. LabelG decodes multi-scale latent features using a split-MLP architecture and aggregates predictions via a voting mechanism to yield anatomically coherent image–mask pairs, without requiring ground-truth masks or textual prompts at inference time. Experiments on four CT datasets demonstrate that the generated pairs achieve high visual fidelity and can improve downstream segmentation performance when used to augment limited real data. LabelG offers an efficient and scalable approach for synthesizing paired medical data, helping enhance data efficiency in medical image segmentation.} }
Endnote
%0 Conference Paper %T LabelG: Consistent Pairwise 3D CT Image and Segmentation Mask Generation via Medical Foundation Models %A Lu-Yan Wang %A Tzung-Dau Wang %A Shang-Hong Lai %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-wang26c %I PMLR %P 400--413 %U https://proceedings.mlr.press/v315/wang26c.html %V 315 %X Medical image generation is increasingly used for data augmentation in tasks such as segmentation. However, most existing approaches focus solely on synthesizing high-quality images, while the corresponding segmentation masks are generated separately or may lack structural alignment with the images. To address this limitation, we introduce LabelG, a lightweight module that works with pretrained 3D CT diffusion foundation models to produce paired CT images and segmentation masks in a single sampling pass. LabelG decodes multi-scale latent features using a split-MLP architecture and aggregates predictions via a voting mechanism to yield anatomically coherent image–mask pairs, without requiring ground-truth masks or textual prompts at inference time. Experiments on four CT datasets demonstrate that the generated pairs achieve high visual fidelity and can improve downstream segmentation performance when used to augment limited real data. LabelG offers an efficient and scalable approach for synthesizing paired medical data, helping enhance data efficiency in medical image segmentation.
APA
Wang, L., Wang, T. & Lai, S.. (2026). LabelG: Consistent Pairwise 3D CT Image and Segmentation Mask Generation via Medical Foundation Models. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:400-413 Available from https://proceedings.mlr.press/v315/wang26c.html.

Related Material