LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation

Piyush Tiwary, Kinjawl Bhattacharyya, Prathosh Ap
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:59687-59722, 2025.

Abstract

Medical image segmentation models often struggle to generalize across different domains due to various reasons. Domain Generalization (DG) methods overcome this either through representation learning or data augmentation (DA). While representation learning methods seek domain-invariant features, they often rely on ad-hoc techniques and lack formal guarantees. DA methods, which enrich model representations through synthetic samples, have shown comparable or superior performance to representation learning approaches. We propose LangDAug, a novel Langevin Data Augmentation for multi-source domain generalization in 2D medical image segmentation. LangDAug leverages Energy-Based Models (EBMs) trained via contrastive divergence to traverse between source domains, generating intermediate samples through Langevin dynamics. Theoretical analysis shows that LangDAug induces a regularization effect, and for GLMs, it upper-bounds the Rademacher complexity by the intrinsic dimensionality of the data manifold. Through extensive experiments on Fundus segmentation and 2D MRI prostate segmentation benchmarks, we show that LangDAug outperforms state-of-the-art domain generalization methods and effectively complements existing domain-randomization approaches. The codebase for our method is available at https://github.com/backpropagator/LangDAug.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-tiwary25a, title = {{L}ang{DA}ug: {L}angevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation}, author = {Tiwary, Piyush and Bhattacharyya, Kinjawl and Ap, Prathosh}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {59687--59722}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/tiwary25a/tiwary25a.pdf}, url = {https://proceedings.mlr.press/v267/tiwary25a.html}, abstract = {Medical image segmentation models often struggle to generalize across different domains due to various reasons. Domain Generalization (DG) methods overcome this either through representation learning or data augmentation (DA). While representation learning methods seek domain-invariant features, they often rely on ad-hoc techniques and lack formal guarantees. DA methods, which enrich model representations through synthetic samples, have shown comparable or superior performance to representation learning approaches. We propose LangDAug, a novel Langevin Data Augmentation for multi-source domain generalization in 2D medical image segmentation. LangDAug leverages Energy-Based Models (EBMs) trained via contrastive divergence to traverse between source domains, generating intermediate samples through Langevin dynamics. Theoretical analysis shows that LangDAug induces a regularization effect, and for GLMs, it upper-bounds the Rademacher complexity by the intrinsic dimensionality of the data manifold. Through extensive experiments on Fundus segmentation and 2D MRI prostate segmentation benchmarks, we show that LangDAug outperforms state-of-the-art domain generalization methods and effectively complements existing domain-randomization approaches. The codebase for our method is available at https://github.com/backpropagator/LangDAug.} }
Endnote
%0 Conference Paper %T LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation %A Piyush Tiwary %A Kinjawl Bhattacharyya %A Prathosh Ap %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-tiwary25a %I PMLR %P 59687--59722 %U https://proceedings.mlr.press/v267/tiwary25a.html %V 267 %X Medical image segmentation models often struggle to generalize across different domains due to various reasons. Domain Generalization (DG) methods overcome this either through representation learning or data augmentation (DA). While representation learning methods seek domain-invariant features, they often rely on ad-hoc techniques and lack formal guarantees. DA methods, which enrich model representations through synthetic samples, have shown comparable or superior performance to representation learning approaches. We propose LangDAug, a novel Langevin Data Augmentation for multi-source domain generalization in 2D medical image segmentation. LangDAug leverages Energy-Based Models (EBMs) trained via contrastive divergence to traverse between source domains, generating intermediate samples through Langevin dynamics. Theoretical analysis shows that LangDAug induces a regularization effect, and for GLMs, it upper-bounds the Rademacher complexity by the intrinsic dimensionality of the data manifold. Through extensive experiments on Fundus segmentation and 2D MRI prostate segmentation benchmarks, we show that LangDAug outperforms state-of-the-art domain generalization methods and effectively complements existing domain-randomization approaches. The codebase for our method is available at https://github.com/backpropagator/LangDAug.
APA
Tiwary, P., Bhattacharyya, K. & Ap, P.. (2025). LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:59687-59722 Available from https://proceedings.mlr.press/v267/tiwary25a.html.

Related Material