Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Raviteja Vemulapalli; Hadi Pouransari; Fartash Faghri; Sachin Mehta; Mehrdad Farajtabar; Mohammad Rastegari; Oncel Tuzel

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49345-49367, 2024.

Abstract

Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?", and propose a simple task-oriented knowledge transfer approach as a highly effective solution to this problem. Our experimental results on five target tasks show that the proposed approach outperforms task-agnostic VFM distillation, web-scale CLIP pretraining, supervised ImageNet pretraining, and self-supervised DINO pretraining by up to 11.6%, 22.1%, 13.7%, and 29.8%, respectively. Furthermore, the proposed approach also demonstrates up to 9x, 4x and 15x reduction in pretraining compute cost when compared to task-agnostic VFM distillation, ImageNet pretraining and DINO pretraining, respectively, while outperforming them. We also show that the dataset used for transferring knowledge has a significant effect on the final target task performance, and introduce a retrieval-augmented knowledge transfer strategy that uses web-scale image retrieval to curate effective transfer sets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-vemulapalli24a,
  title = 	 {Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models},
  author =       {Vemulapalli, Raviteja and Pouransari, Hadi and Faghri, Fartash and Mehta, Sachin and Farajtabar, Mehrdad and Rastegari, Mohammad and Tuzel, Oncel},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {49345--49367},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/vemulapalli24a/vemulapalli24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/vemulapalli24a.html},
  abstract = 	 {Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?", and propose a simple task-oriented knowledge transfer approach as a highly effective solution to this problem. Our experimental results on five target tasks show that the proposed approach outperforms task-agnostic VFM distillation, web-scale CLIP pretraining, supervised ImageNet pretraining, and self-supervised DINO pretraining by up to 11.6%, 22.1%, 13.7%, and 29.8%, respectively. Furthermore, the proposed approach also demonstrates up to 9x, 4x and 15x reduction in pretraining compute cost when compared to task-agnostic VFM distillation, ImageNet pretraining and DINO pretraining, respectively, while outperforming them. We also show that the dataset used for transferring knowledge has a significant effect on the final target task performance, and introduce a retrieval-augmented knowledge transfer strategy that uses web-scale image retrieval to curate effective transfer sets.}
}

Endnote

%0 Conference Paper
%T Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
%A Raviteja Vemulapalli
%A Hadi Pouransari
%A Fartash Faghri
%A Sachin Mehta
%A Mehrdad Farajtabar
%A Mohammad Rastegari
%A Oncel Tuzel
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-vemulapalli24a
%I PMLR
%P 49345--49367
%U https://proceedings.mlr.press/v235/vemulapalli24a.html
%V 235
%X Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?", and propose a simple task-oriented knowledge transfer approach as a highly effective solution to this problem. Our experimental results on five target tasks show that the proposed approach outperforms task-agnostic VFM distillation, web-scale CLIP pretraining, supervised ImageNet pretraining, and self-supervised DINO pretraining by up to 11.6%, 22.1%, 13.7%, and 29.8%, respectively. Furthermore, the proposed approach also demonstrates up to 9x, 4x and 15x reduction in pretraining compute cost when compared to task-agnostic VFM distillation, ImageNet pretraining and DINO pretraining, respectively, while outperforming them. We also show that the dataset used for transferring knowledge has a significant effect on the final target task performance, and introduce a retrieval-augmented knowledge transfer strategy that uses web-scale image retrieval to curate effective transfer sets.

APA


Vemulapalli, R., Pouransari, H., Faghri, F., Mehta, S., Farajtabar, M., Rastegari, M. & Tuzel, O.. (2024). Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:49345-49367 Available from https://proceedings.mlr.press/v235/vemulapalli24a.html.

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Abstract

Cite this Paper

Related Material