Adapter to facilitate Foundation Model Communication for DLO Instance Segmentation

Omkar Joglekar, Shir Kozlovsky, Dotan Di Castro
Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, PMLR 285:269-281, 2024.

Abstract

Classical methods in Digital Communication rely on mixing transmitted signals with carrier frequencies to eliminate signal distortion through noisy channels. Drawing inspiration from these techniques, we present an adapter network that enables CLIPSeg, a text-conditioned semantic segmentation model, to communicate point prompts to the Segment Anything Model (SAM) in the positional embedding space. We showcase our technique on the complex task of Deformable Linear Object (DLO) Instance Segmentation. Our method combines the strong zero-shot generalization capability of SAM and user-friendliness of CLIPSeg to exceed the SOTA performance in DLO Instance Segmentation in terms of DICE Score, while training only 0.7% of the model parameters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v285-joglekar24a, title = {Adapter to facilitate Foundation Model Communication for {DLO} Instance Segmentation}, author = {Joglekar, Omkar and Kozlovsky, Shir and Castro, Dotan Di}, booktitle = {Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models}, pages = {269--281}, year = {2024}, editor = {Fumero, Marco and Domine, Clementine and Lähner, Zorah and Crisostomi, Donato and Moschella, Luca and Stachenfeld, Kimberly}, volume = {285}, series = {Proceedings of Machine Learning Research}, month = {14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v285/main/assets/joglekar24a/joglekar24a.pdf}, url = {https://proceedings.mlr.press/v285/joglekar24a.html}, abstract = {Classical methods in Digital Communication rely on mixing transmitted signals with carrier frequencies to eliminate signal distortion through noisy channels. Drawing inspiration from these techniques, we present an adapter network that enables CLIPSeg, a text-conditioned semantic segmentation model, to communicate point prompts to the Segment Anything Model (SAM) in the positional embedding space. We showcase our technique on the complex task of Deformable Linear Object (DLO) Instance Segmentation. Our method combines the strong zero-shot generalization capability of SAM and user-friendliness of CLIPSeg to exceed the SOTA performance in DLO Instance Segmentation in terms of DICE Score, while training only 0.7% of the model parameters.} }
Endnote
%0 Conference Paper %T Adapter to facilitate Foundation Model Communication for DLO Instance Segmentation %A Omkar Joglekar %A Shir Kozlovsky %A Dotan Di Castro %B Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Clementine Domine %E Zorah Lähner %E Donato Crisostomi %E Luca Moschella %E Kimberly Stachenfeld %F pmlr-v285-joglekar24a %I PMLR %P 269--281 %U https://proceedings.mlr.press/v285/joglekar24a.html %V 285 %X Classical methods in Digital Communication rely on mixing transmitted signals with carrier frequencies to eliminate signal distortion through noisy channels. Drawing inspiration from these techniques, we present an adapter network that enables CLIPSeg, a text-conditioned semantic segmentation model, to communicate point prompts to the Segment Anything Model (SAM) in the positional embedding space. We showcase our technique on the complex task of Deformable Linear Object (DLO) Instance Segmentation. Our method combines the strong zero-shot generalization capability of SAM and user-friendliness of CLIPSeg to exceed the SOTA performance in DLO Instance Segmentation in terms of DICE Score, while training only 0.7% of the model parameters.
APA
Joglekar, O., Kozlovsky, S. & Castro, D.D.. (2024). Adapter to facilitate Foundation Model Communication for DLO Instance Segmentation. Proceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 285:269-281 Available from https://proceedings.mlr.press/v285/joglekar24a.html.

Related Material