Training-Free Dataset Pruning for Polyp Segmentation via Community Detection in Similarity Networks

Md Mostafijur Rahman, Radu Marculescu
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1384-1402, 2026.

Abstract

Recent advances in deep learning have been driven by the availability of larger datasets and more complex models; however, this progress comes at the expense of substantial computational and annotation costs. To address these issues, we introduce a new, training-free dataset pruning method, *PRIME*, targeting polyp segmentation in medical imaging. To this end, *PRIME* constructs a similarity network among images in the target dataset and then applies community detection to retain a much smaller, yet representative subset of images from the original dataset. Unlike existing methods that require model training for dataset pruning, our *PRIME* completely avoids model training, thus significantly reducing computational demands. The reduction in the training dataset reduces 56.2% data annotation costs and enables 2.3$\times$ faster training of polyp segmentation models compared to training on the entire annotated dataset, with only a 0.5% drop in the DICE score. Consequently, our *PRIME* enables efficient training, fine-tuning, and domain adaptation across medical centers, thus offering a cost-effective solution for deep learning in polyp segmentation. Our implementation is available at https://github.com/SLDGroup/PRIME.

Cite this Paper


BibTeX
@InProceedings{pmlr-v301-rahman26a, title = {Training-Free Dataset Pruning for Polyp Segmentation via Community Detection in Similarity Networks}, author = {Rahman, Md Mostafijur and Marculescu, Radu}, booktitle = {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning}, pages = {1384--1402}, year = {2026}, editor = {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan}, volume = {301}, series = {Proceedings of Machine Learning Research}, month = {09--11 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v301/main/assets/rahman26a/rahman26a.pdf}, url = {https://proceedings.mlr.press/v301/rahman26a.html}, abstract = {Recent advances in deep learning have been driven by the availability of larger datasets and more complex models; however, this progress comes at the expense of substantial computational and annotation costs. To address these issues, we introduce a new, training-free dataset pruning method, *PRIME*, targeting polyp segmentation in medical imaging. To this end, *PRIME* constructs a similarity network among images in the target dataset and then applies community detection to retain a much smaller, yet representative subset of images from the original dataset. Unlike existing methods that require model training for dataset pruning, our *PRIME* completely avoids model training, thus significantly reducing computational demands. The reduction in the training dataset reduces 56.2% data annotation costs and enables 2.3$\times$ faster training of polyp segmentation models compared to training on the entire annotated dataset, with only a 0.5% drop in the DICE score. Consequently, our *PRIME* enables efficient training, fine-tuning, and domain adaptation across medical centers, thus offering a cost-effective solution for deep learning in polyp segmentation. Our implementation is available at https://github.com/SLDGroup/PRIME.} }
Endnote
%0 Conference Paper %T Training-Free Dataset Pruning for Polyp Segmentation via Community Detection in Similarity Networks %A Md Mostafijur Rahman %A Radu Marculescu %B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Tolga Tasdizen %E Shireen Elhabian %E Ronald Summers %E Chen Chen %E Lisa Koch %E Yan Zhuang %F pmlr-v301-rahman26a %I PMLR %P 1384--1402 %U https://proceedings.mlr.press/v301/rahman26a.html %V 301 %X Recent advances in deep learning have been driven by the availability of larger datasets and more complex models; however, this progress comes at the expense of substantial computational and annotation costs. To address these issues, we introduce a new, training-free dataset pruning method, *PRIME*, targeting polyp segmentation in medical imaging. To this end, *PRIME* constructs a similarity network among images in the target dataset and then applies community detection to retain a much smaller, yet representative subset of images from the original dataset. Unlike existing methods that require model training for dataset pruning, our *PRIME* completely avoids model training, thus significantly reducing computational demands. The reduction in the training dataset reduces 56.2% data annotation costs and enables 2.3$\times$ faster training of polyp segmentation models compared to training on the entire annotated dataset, with only a 0.5% drop in the DICE score. Consequently, our *PRIME* enables efficient training, fine-tuning, and domain adaptation across medical centers, thus offering a cost-effective solution for deep learning in polyp segmentation. Our implementation is available at https://github.com/SLDGroup/PRIME.
APA
Rahman, M.M. & Marculescu, R.. (2026). Training-Free Dataset Pruning for Polyp Segmentation via Community Detection in Similarity Networks. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:1384-1402 Available from https://proceedings.mlr.press/v301/rahman26a.html.

Related Material