When is Transfer Learning Possible?

My Phan, Kianté Brantley, Stephanie Milani, Soroush Mehri, Gokul Swamy, Geoffrey J. Gordon
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:40642-40666, 2024.

Abstract

We present a general framework for transfer learning that is flexible enough to capture transfer in supervised, reinforcement, and imitation learning. Our framework enables new insights into the fundamental question of when we can successfully transfer learned information across problems. We model the learner as interacting with a sequence of problem instances, or environments, each of which is generated from a common structural causal model (SCM) by choosing the SCM’s parameters from restricted sets. We derive a procedure that can propagate restrictions on SCM parameters through the SCM’s graph structure to other parameters that we are trying to learn. The propagated restrictions then enable more efficient learning (i.e., transfer). By analyzing the procedure, we are able to challenge widely-held beliefs about transfer learning. First, we show that having sparse changes across environments is neither necessary nor sufficient for transfer. Second, we show an example where the common heuristic of freezing a layer in a network causes poor transfer performance. We then use our procedure to select a more refined set of parameters to freeze, leading to successful transfer learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-phan24a, title = {When is Transfer Learning Possible?}, author = {Phan, My and Brantley, Kiant\'{e} and Milani, Stephanie and Mehri, Soroush and Swamy, Gokul and Gordon, Geoffrey J.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {40642--40666}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/phan24a/phan24a.pdf}, url = {https://proceedings.mlr.press/v235/phan24a.html}, abstract = {We present a general framework for transfer learning that is flexible enough to capture transfer in supervised, reinforcement, and imitation learning. Our framework enables new insights into the fundamental question of when we can successfully transfer learned information across problems. We model the learner as interacting with a sequence of problem instances, or environments, each of which is generated from a common structural causal model (SCM) by choosing the SCM’s parameters from restricted sets. We derive a procedure that can propagate restrictions on SCM parameters through the SCM’s graph structure to other parameters that we are trying to learn. The propagated restrictions then enable more efficient learning (i.e., transfer). By analyzing the procedure, we are able to challenge widely-held beliefs about transfer learning. First, we show that having sparse changes across environments is neither necessary nor sufficient for transfer. Second, we show an example where the common heuristic of freezing a layer in a network causes poor transfer performance. We then use our procedure to select a more refined set of parameters to freeze, leading to successful transfer learning.} }
Endnote
%0 Conference Paper %T When is Transfer Learning Possible? %A My Phan %A Kianté Brantley %A Stephanie Milani %A Soroush Mehri %A Gokul Swamy %A Geoffrey J. Gordon %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-phan24a %I PMLR %P 40642--40666 %U https://proceedings.mlr.press/v235/phan24a.html %V 235 %X We present a general framework for transfer learning that is flexible enough to capture transfer in supervised, reinforcement, and imitation learning. Our framework enables new insights into the fundamental question of when we can successfully transfer learned information across problems. We model the learner as interacting with a sequence of problem instances, or environments, each of which is generated from a common structural causal model (SCM) by choosing the SCM’s parameters from restricted sets. We derive a procedure that can propagate restrictions on SCM parameters through the SCM’s graph structure to other parameters that we are trying to learn. The propagated restrictions then enable more efficient learning (i.e., transfer). By analyzing the procedure, we are able to challenge widely-held beliefs about transfer learning. First, we show that having sparse changes across environments is neither necessary nor sufficient for transfer. Second, we show an example where the common heuristic of freezing a layer in a network causes poor transfer performance. We then use our procedure to select a more refined set of parameters to freeze, leading to successful transfer learning.
APA
Phan, M., Brantley, K., Milani, S., Mehri, S., Swamy, G. & Gordon, G.J.. (2024). When is Transfer Learning Possible?. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:40642-40666 Available from https://proceedings.mlr.press/v235/phan24a.html.

Related Material