Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion

Jakob Lønborg Christensen, Morten Rieger Hannemose, Anders Dahl, Vedrana Andersen Dahl
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307:49-56, 2026.

Abstract

This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v307-christensen26a, title = {Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion}, author = {Christensen, Jakob L{\o}nborg and Hannemose, Morten Rieger and Dahl, Anders and Dahl, Vedrana Andersen}, booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)}, pages = {49--56}, year = {2026}, editor = {Kim, Hyeongji and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {307}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v307/main/assets/christensen26a/christensen26a.pdf}, url = {https://proceedings.mlr.press/v307/christensen26a.html}, abstract = {This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.} }
Endnote
%0 Conference Paper %T Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion %A Jakob Lønborg Christensen %A Morten Rieger Hannemose %A Anders Dahl %A Vedrana Andersen Dahl %B Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2026 %E Hyeongji Kim %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v307-christensen26a %I PMLR %P 49--56 %U https://proceedings.mlr.press/v307/christensen26a.html %V 307 %X This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.
APA
Christensen, J.L., Hannemose, M.R., Dahl, A. & Dahl, V.A.. (2026). Towards Agnostic and Holistic Universal Image Segmentation with Bit Diffusion. Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 307:49-56 Available from https://proceedings.mlr.press/v307/christensen26a.html.

Related Material