Variational Learning is Effective for Large Deep Networks

Yuesong Shen; Nico Daheim; Bai Cong; Peter Nickl; Gian Maria Marconi; Bazan Clement Emile Marcel Raoul; Rio Yokota; Iryna Gurevych; Daniel Cremers; Mohammad Emtiyaz Khan; Thomas Möllenhoff

Variational Learning is Effective for Large Deep Networks

Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Bazan Clement Emile Marcel Raoul, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44665-44686, 2024.

Abstract

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-shen24b,
  title = 	 {Variational Learning is Effective for Large Deep Networks},
  author =       {Shen, Yuesong and Daheim, Nico and Cong, Bai and Nickl, Peter and Marconi, Gian Maria and Raoul, Bazan Clement Emile Marcel and Yokota, Rio and Gurevych, Iryna and Cremers, Daniel and Khan, Mohammad Emtiyaz and M\"{o}llenhoff, Thomas},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {44665--44686},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/shen24b/shen24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/shen24b.html},
  abstract = 	 {We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.}
}

Endnote

%0 Conference Paper
%T Variational Learning is Effective for Large Deep Networks
%A Yuesong Shen
%A Nico Daheim
%A Bai Cong
%A Peter Nickl
%A Gian Maria Marconi
%A Bazan Clement Emile Marcel Raoul
%A Rio Yokota
%A Iryna Gurevych
%A Daniel Cremers
%A Mohammad Emtiyaz Khan
%A Thomas Möllenhoff
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-shen24b
%I PMLR
%P 44665--44686
%U https://proceedings.mlr.press/v235/shen24b.html
%V 235
%X We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.

APA

Shen, Y., Daheim, N., Cong, B., Nickl, P., Marconi, G.M., Raoul, B.C.E.M., Yokota, R., Gurevych, I., Cremers, D., Khan, M.E. & Möllenhoff, T.. (2024). Variational Learning is Effective for Large Deep Networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44665-44686 Available from https://proceedings.mlr.press/v235/shen24b.html.

Variational Learning is Effective for Large Deep Networks

Abstract

Cite this Paper

Related Material