Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

Idan Achituve, Idit Diamant, Arnon Netzer, Gal Chechik, Ethan Fetaya
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:117-134, 2024.

Abstract

As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Multi-task learning (MTL) addresses this challenge by learning a single model that solves several tasks simultaneously and efficiently. Often optimizing MTL models entails first computing the gradient of the loss for each task, and then aggregating all the gradients to obtain a combined update direction. However, common methods following this approach do not consider an important aspect, the sensitivity in the dimensions of the gradients. Some dimensions may be more lenient for changes while others may be more restrictive. Here, we introduce a novel gradient aggregation procedure using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This valuable information allows us to quantify the uncertainty associated with each of the gradients’ dimensions which is factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-achituve24a, title = {{B}ayesian Uncertainty for Gradient Aggregation in Multi-Task Learning}, author = {Achituve, Idan and Diamant, Idit and Netzer, Arnon and Chechik, Gal and Fetaya, Ethan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {117--134}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/achituve24a/achituve24a.pdf}, url = {https://proceedings.mlr.press/v235/achituve24a.html}, abstract = {As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Multi-task learning (MTL) addresses this challenge by learning a single model that solves several tasks simultaneously and efficiently. Often optimizing MTL models entails first computing the gradient of the loss for each task, and then aggregating all the gradients to obtain a combined update direction. However, common methods following this approach do not consider an important aspect, the sensitivity in the dimensions of the gradients. Some dimensions may be more lenient for changes while others may be more restrictive. Here, we introduce a novel gradient aggregation procedure using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This valuable information allows us to quantify the uncertainty associated with each of the gradients’ dimensions which is factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.} }
Endnote
%0 Conference Paper %T Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning %A Idan Achituve %A Idit Diamant %A Arnon Netzer %A Gal Chechik %A Ethan Fetaya %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-achituve24a %I PMLR %P 117--134 %U https://proceedings.mlr.press/v235/achituve24a.html %V 235 %X As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Multi-task learning (MTL) addresses this challenge by learning a single model that solves several tasks simultaneously and efficiently. Often optimizing MTL models entails first computing the gradient of the loss for each task, and then aggregating all the gradients to obtain a combined update direction. However, common methods following this approach do not consider an important aspect, the sensitivity in the dimensions of the gradients. Some dimensions may be more lenient for changes while others may be more restrictive. Here, we introduce a novel gradient aggregation procedure using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This valuable information allows us to quantify the uncertainty associated with each of the gradients’ dimensions which is factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.
APA
Achituve, I., Diamant, I., Netzer, A., Chechik, G. & Fetaya, E.. (2024). Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:117-134 Available from https://proceedings.mlr.press/v235/achituve24a.html.

Related Material