[edit]
Confidence-Aware Contrastive Distillation for Test-time Prompt Tuning
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:660-666, 2025.
Abstract
Pre-trained vision-language models like CLIP have shown strong performance on various visual recognition tasks but often suffer from poor generalization under distribution shifts. Test-Time Prompt Tuning (TPT) is a promising solution that adapts prompt embeddings during inference using entropy minimization on unlabeled test data, while keeping the vision and text encoders frozen. However, entropy-based tuning lacks structural regularization and can lead to overconfident misclassifications. In this paper, we introduce Confidence-Aware Contrastive Distillation (CaCoD), a lightweight and effective approach to improve the robustness and calibration of TPT. Our method leverages the confidence structure of test-time predictions by identifying high- and low-confidence samples, and aligning their feature representations through a contrastive distillation loss. This encourages semantically meaningful updates to the prompt embeddings without requiring labels or retraining. Experiments across 11 fine-grained datasets demonstrate that CaCoD consistently reduces calibration error and improves predictive reliability, while maintaining strong accuracy. Our approach is model-agnostic and easily pluggable into existing TPT pipelines.