OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning

Cong Hua; Qianqian Xu; Zhiyong Yang; Zitai Wang; Shilong Bao; Qingming Huang

OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning

Cong Hua, Qianqian Xu, Zhiyong Yang, Zitai Wang, Shilong Bao, Qingming Huang

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24975-25020, 2025.

Abstract

Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What’s more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose $\mathsf{OpenworldAUC}$, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize $\mathsf{OpenworldAUC}$ effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on $\mathsf{OpenworldAUC}$ and other metrics.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-hua25d,
  title = 	 {{O}penworld{AUC}: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning},
  author =       {Hua, Cong and Xu, Qianqian and Yang, Zhiyong and Wang, Zitai and Bao, Shilong and Huang, Qingming},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {24975--25020},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hua25d/hua25d.pdf},
  url = 	 {https://proceedings.mlr.press/v267/hua25d.html},
  abstract = 	 {Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What’s more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose $\mathsf{OpenworldAUC}$, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize $\mathsf{OpenworldAUC}$ effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on $\mathsf{OpenworldAUC}$ and other metrics.}
}

Endnote

%0 Conference Paper
%T OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
%A Cong Hua
%A Qianqian Xu
%A Zhiyong Yang
%A Zitai Wang
%A Shilong Bao
%A Qingming Huang
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-hua25d
%I PMLR
%P 24975--25020
%U https://proceedings.mlr.press/v267/hua25d.html
%V 267
%X Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What’s more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose $\mathsf{OpenworldAUC}$, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize $\mathsf{OpenworldAUC}$ effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on $\mathsf{OpenworldAUC}$ and other metrics.

APA

Hua, C., Xu, Q., Yang, Z., Wang, Z., Bao, S. & Huang, Q.. (2025). OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:24975-25020 Available from https://proceedings.mlr.press/v267/hua25d.html.

OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning

Abstract

Cite this Paper

Related Material