Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang, Xu Yang, Haokun Chen, Xin Geng
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:52019-52032, 2024.

Abstract

Deep learning has advanced through the combination of large datasets and computational power, leading to the development of extensive pre-trained models like Vision Transformers (ViTs). However, these models often assume a one-size-fits-all utility, lacking the ability to initialize models with elastic scales tailored to the resource constraints of specific downstream tasks. To address these issues, we propose Probabilistic Expansion from LearnGene (PEG) for mixture sampling and elastic initialization of Vision Transformers. Specifically, PEG utilizes a probabilistic mixture approach to sample Multi-Head Self-Attention layers and Feed-Forward Networks from a large ancestry model into a more compact part termed as learngene. Theoretically, we demonstrate that these learngene can approximate the parameter distribution of the original ancestry model, thereby preserving its significant knowledge. Next, PEG expands the sampled learngene through non-linear mapping, enabling the initialization of descendant models with elastic scales to suit various resource constraints. Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24cf, title = {Vision Transformers as Probabilistic Expansion from Learngene}, author = {Wang, Qiufeng and Yang, Xu and Chen, Haokun and Geng, Xin}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {52019--52032}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24cf/wang24cf.pdf}, url = {https://proceedings.mlr.press/v235/wang24cf.html}, abstract = {Deep learning has advanced through the combination of large datasets and computational power, leading to the development of extensive pre-trained models like Vision Transformers (ViTs). However, these models often assume a one-size-fits-all utility, lacking the ability to initialize models with elastic scales tailored to the resource constraints of specific downstream tasks. To address these issues, we propose Probabilistic Expansion from LearnGene (PEG) for mixture sampling and elastic initialization of Vision Transformers. Specifically, PEG utilizes a probabilistic mixture approach to sample Multi-Head Self-Attention layers and Feed-Forward Networks from a large ancestry model into a more compact part termed as learngene. Theoretically, we demonstrate that these learngene can approximate the parameter distribution of the original ancestry model, thereby preserving its significant knowledge. Next, PEG expands the sampled learngene through non-linear mapping, enabling the initialization of descendant models with elastic scales to suit various resource constraints. Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.} }
Endnote
%0 Conference Paper %T Vision Transformers as Probabilistic Expansion from Learngene %A Qiufeng Wang %A Xu Yang %A Haokun Chen %A Xin Geng %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24cf %I PMLR %P 52019--52032 %U https://proceedings.mlr.press/v235/wang24cf.html %V 235 %X Deep learning has advanced through the combination of large datasets and computational power, leading to the development of extensive pre-trained models like Vision Transformers (ViTs). However, these models often assume a one-size-fits-all utility, lacking the ability to initialize models with elastic scales tailored to the resource constraints of specific downstream tasks. To address these issues, we propose Probabilistic Expansion from LearnGene (PEG) for mixture sampling and elastic initialization of Vision Transformers. Specifically, PEG utilizes a probabilistic mixture approach to sample Multi-Head Self-Attention layers and Feed-Forward Networks from a large ancestry model into a more compact part termed as learngene. Theoretically, we demonstrate that these learngene can approximate the parameter distribution of the original ancestry model, thereby preserving its significant knowledge. Next, PEG expands the sampled learngene through non-linear mapping, enabling the initialization of descendant models with elastic scales to suit various resource constraints. Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.
APA
Wang, Q., Yang, X., Chen, H. & Geng, X.. (2024). Vision Transformers as Probabilistic Expansion from Learngene. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:52019-52032 Available from https://proceedings.mlr.press/v235/wang24cf.html.

Related Material