Last Layer Marginal Likelihood for Invariance Learning

Pola Schwöbel, Martin Jørgensen, Sebastian W. Ober, Mark Van Der Wilk
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:3542-3555, 2022.

Abstract

Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. Computing the marginal likelihood is hard for neural networks, but success with tractable approaches that compute the marginal likelihood for the last layer only raises the question of whether this convenient approach might be employed for learning invariances. We show partial success on standard benchmarks, in the low-data regime and on a medical imaging dataset by designing a custom optimisation routine. Introducing a new lower bound to the marginal likelihood allows us to perform inference for a larger class of likelihood functions than before. On the other hand, we demonstrate failure modes on the CIFAR10 dataset, where the last layer approximation is not sufficient due to the increased complexity of our neural network. Our results indicate that once more sophisticated approximations become available the marginal likelihood is a promising approach for invariance learning in neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-schwobel22a, title = { Last Layer Marginal Likelihood for Invariance Learning }, author = {Schw\"obel, Pola and J{\o}rgensen, Martin and Ober, Sebastian W. and Van Der Wilk, Mark}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {3542--3555}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/schwobel22a/schwobel22a.pdf}, url = {https://proceedings.mlr.press/v151/schwobel22a.html}, abstract = { Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. Computing the marginal likelihood is hard for neural networks, but success with tractable approaches that compute the marginal likelihood for the last layer only raises the question of whether this convenient approach might be employed for learning invariances. We show partial success on standard benchmarks, in the low-data regime and on a medical imaging dataset by designing a custom optimisation routine. Introducing a new lower bound to the marginal likelihood allows us to perform inference for a larger class of likelihood functions than before. On the other hand, we demonstrate failure modes on the CIFAR10 dataset, where the last layer approximation is not sufficient due to the increased complexity of our neural network. Our results indicate that once more sophisticated approximations become available the marginal likelihood is a promising approach for invariance learning in neural networks. } }
Endnote
%0 Conference Paper %T Last Layer Marginal Likelihood for Invariance Learning %A Pola Schwöbel %A Martin Jørgensen %A Sebastian W. Ober %A Mark Van Der Wilk %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-schwobel22a %I PMLR %P 3542--3555 %U https://proceedings.mlr.press/v151/schwobel22a.html %V 151 %X Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. Computing the marginal likelihood is hard for neural networks, but success with tractable approaches that compute the marginal likelihood for the last layer only raises the question of whether this convenient approach might be employed for learning invariances. We show partial success on standard benchmarks, in the low-data regime and on a medical imaging dataset by designing a custom optimisation routine. Introducing a new lower bound to the marginal likelihood allows us to perform inference for a larger class of likelihood functions than before. On the other hand, we demonstrate failure modes on the CIFAR10 dataset, where the last layer approximation is not sufficient due to the increased complexity of our neural network. Our results indicate that once more sophisticated approximations become available the marginal likelihood is a promising approach for invariance learning in neural networks.
APA
Schwöbel, P., Jørgensen, M., Ober, S.W. & Van Der Wilk, M.. (2022). Last Layer Marginal Likelihood for Invariance Learning . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:3542-3555 Available from https://proceedings.mlr.press/v151/schwobel22a.html.

Related Material