Deep Reference Priors: What is the best way to pretrain a model?

Yansong Gao; Rahul Ramesh; Pratik Chaudhari

Deep Reference Priors: What is the best way to pretrain a model?

Yansong Gao, Rahul Ramesh, Pratik Chaudhari

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7036-7051, 2022.

Abstract

What is the best way to exploit extra data – be it unlabeled data from the same task, or labeled data from a related task – to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets. Code is available at https://github.com/grasp-lyrl/deep_reference_priors

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-gao22d,
  title = 	 {Deep Reference Priors: What is the best way to pretrain a model?},
  author =       {Gao, Yansong and Ramesh, Rahul and Chaudhari, Pratik},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {7036--7051},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/gao22d/gao22d.pdf},
  url = 	 {https://proceedings.mlr.press/v162/gao22d.html},
  abstract = 	 {What is the best way to exploit extra data – be it unlabeled data from the same task, or labeled data from a related task – to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets. Code is available at https://github.com/grasp-lyrl/deep_reference_priors}
}

Endnote

%0 Conference Paper
%T Deep Reference Priors: What is the best way to pretrain a model?
%A Yansong Gao
%A Rahul Ramesh
%A Pratik Chaudhari
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-gao22d
%I PMLR
%P 7036--7051
%U https://proceedings.mlr.press/v162/gao22d.html
%V 162
%X What is the best way to exploit extra data – be it unlabeled data from the same task, or labeled data from a related task – to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets. Code is available at https://github.com/grasp-lyrl/deep_reference_priors

APA


Gao, Y., Ramesh, R. & Chaudhari, P.. (2022). Deep Reference Priors: What is the best way to pretrain a model?. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:7036-7051 Available from https://proceedings.mlr.press/v162/gao22d.html.

Deep Reference Priors: What is the best way to pretrain a model?

Abstract

Cite this Paper

Related Material