[edit]
Training-Free Dataset Pruning for Polyp Segmentation via Community Detection in Similarity Networks
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1384-1402, 2026.
Abstract
Recent advances in deep learning have been driven by the availability of larger datasets and more complex models; however, this progress comes at the expense of substantial computational and annotation costs. To address these issues, we introduce a new, training-free dataset pruning method, *PRIME*, targeting polyp segmentation in medical imaging. To this end, *PRIME* constructs a similarity network among images in the target dataset and then applies community detection to retain a much smaller, yet representative subset of images from the original dataset. Unlike existing methods that require model training for dataset pruning, our *PRIME* completely avoids model training, thus significantly reducing computational demands. The reduction in the training dataset reduces 56.2% data annotation costs and enables 2.3$\times$ faster training of polyp segmentation models compared to training on the entire annotated dataset, with only a 0.5% drop in the DICE score. Consequently, our *PRIME* enables efficient training, fine-tuning, and domain adaptation across medical centers, thus offering a cost-effective solution for deep learning in polyp segmentation. Our implementation is available at https://github.com/SLDGroup/PRIME.