Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Xinyang Liu, Dongsheng Wang, Bowei Fang, Miaoge Li, Yishi Xu, Zhibin Duan, Bo Chen, Mingyuan Zhou
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:2309-2330, 2024.

Abstract

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistic distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results on over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-liu24b, title = {Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models}, author = {Liu, Xinyang and Wang, Dongsheng and Fang, Bowei and Li, Miaoge and Xu, Yishi and Duan, Zhibin and Chen, Bo and Zhou, Mingyuan}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {2309--2330}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/liu24b/liu24b.pdf}, url = {https://proceedings.mlr.press/v244/liu24b.html}, abstract = {For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistic distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results on over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.} }
Endnote
%0 Conference Paper %T Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models %A Xinyang Liu %A Dongsheng Wang %A Bowei Fang %A Miaoge Li %A Yishi Xu %A Zhibin Duan %A Bo Chen %A Mingyuan Zhou %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-liu24b %I PMLR %P 2309--2330 %U https://proceedings.mlr.press/v244/liu24b.html %V 244 %X For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistic distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results on over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.
APA
Liu, X., Wang, D., Fang, B., Li, M., Xu, Y., Duan, Z., Chen, B. & Zhou, M.. (2024). Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:2309-2330 Available from https://proceedings.mlr.press/v244/liu24b.html.

Related Material