Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Duy Minh Ho Nguyen, An Thai Le, Trung Quoc Nguyen, Nghiem Tuong Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:687-702, 2025.

Abstract

Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model’s feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT’s characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-nguyen25c, title = {{Dude}: {D}ual Distribution-Aware Context Prompt Learning For Large Vision-Language Model}, author = {Nguyen, Duy Minh Ho and Le, An Thai and Nguyen, Trung Quoc and Diep, Nghiem Tuong and Nguyen, Tai and Duong-Tran, Duy and Peters, Jan and Shen, Li and Niepert, Mathias and Sonntag, Daniel}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {687--702}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/nguyen25c/nguyen25c.pdf}, url = {https://proceedings.mlr.press/v260/nguyen25c.html}, abstract = {Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model’s feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT’s characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.} }
Endnote
%0 Conference Paper %T Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model %A Duy Minh Ho Nguyen %A An Thai Le %A Trung Quoc Nguyen %A Nghiem Tuong Diep %A Tai Nguyen %A Duy Duong-Tran %A Jan Peters %A Li Shen %A Mathias Niepert %A Daniel Sonntag %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-nguyen25c %I PMLR %P 687--702 %U https://proceedings.mlr.press/v260/nguyen25c.html %V 260 %X Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model’s feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT’s characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.
APA
Nguyen, D.M.H., Le, A.T., Nguyen, T.Q., Diep, N.T., Nguyen, T., Duong-Tran, D., Peters, J., Shen, L., Niepert, M. & Sonntag, D.. (2025). Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:687-702 Available from https://proceedings.mlr.press/v260/nguyen25c.html.

Related Material