Unlocking the Power of SAM 2 for Few-Shot Segmentation

Qianxiong Xu; Lanyun Zhu; Xuanyi Liu; Guosheng Lin; Cheng Long; Ziyue Li; Rui Zhao

Unlocking the Power of SAM 2 for Few-Shot Segmentation

Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69950-69966, 2025.

Abstract

Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2’s video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones, i.e., they are likely to contain incomplete query FG, and some unexpected query background (BG) features, leading to wrong segmentation. Hence, we further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory. Extensive experiments have been conducted on PASCAL-5$^i$ and COCO-20$^i$ to validate the effectiveness of our design, e.g., the 1-shot mIoU can be 4.2% better than the best baseline.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-xu25an,
  title = 	 {Unlocking the Power of {SAM} 2 for Few-Shot Segmentation},
  author =       {Xu, Qianxiong and Zhu, Lanyun and Liu, Xuanyi and Lin, Guosheng and Long, Cheng and Li, Ziyue and Zhao, Rui},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {69950--69966},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25an/xu25an.pdf},
  url = 	 {https://proceedings.mlr.press/v267/xu25an.html},
  abstract = 	 {Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2’s video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones, i.e., they are likely to contain incomplete query FG, and some unexpected query background (BG) features, leading to wrong segmentation. Hence, we further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory. Extensive experiments have been conducted on PASCAL-5$^i$ and COCO-20$^i$ to validate the effectiveness of our design, e.g., the 1-shot mIoU can be 4.2% better than the best baseline.}
}

Endnote

%0 Conference Paper
%T Unlocking the Power of SAM 2 for Few-Shot Segmentation
%A Qianxiong Xu
%A Lanyun Zhu
%A Xuanyi Liu
%A Guosheng Lin
%A Cheng Long
%A Ziyue Li
%A Rui Zhao
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-xu25an
%I PMLR
%P 69950--69966
%U https://proceedings.mlr.press/v267/xu25an.html
%V 267
%X Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2’s video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones, i.e., they are likely to contain incomplete query FG, and some unexpected query background (BG) features, leading to wrong segmentation. Hence, we further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory. Extensive experiments have been conducted on PASCAL-5$^i$ and COCO-20$^i$ to validate the effectiveness of our design, e.g., the 1-shot mIoU can be 4.2% better than the best baseline.

APA

Xu, Q., Zhu, L., Liu, X., Lin, G., Long, C., Li, Z. & Zhao, R.. (2025). Unlocking the Power of SAM 2 for Few-Shot Segmentation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69950-69966 Available from https://proceedings.mlr.press/v267/xu25an.html.

Unlocking the Power of SAM 2 for Few-Shot Segmentation

Abstract

Cite this Paper

Related Material