Improved Active Multi-Task Representation Learning via Lasso

Yiping Wang, Yifang Chen, Kevin Jamieson, Simon Shaolei Du
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35548-35578, 2023.

Abstract

To leverage the copious amount of data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now, most existing works design a source task selection strategy from a purely empirical perspective. Recently, Chen et al., 2022 gave the first active multi-task representation learning (A-MTRL) algorithm which adaptively samples from source tasks and can provably reduce the total sample complexity using the L2-regularized-target-source-relevance parameter ν2. But their work is theoretically suboptimal in terms of total source sample complexity and is less practical in some real-world scenarios where sparse training source task selection is desired. In this paper, we address both issues. Specifically, we show the strict dominance of the L1-regularized-relevance-based (ν1-based) strategy by giving a lower bound for the ν2-based strategy. When ν1 is unknown, we propose a practical algorithm that uses the LASSO program to estimate ν1. Our algorithm successfully recovers the optimal result in the known case. In addition to our sample complexity results, we also characterize the potential of our ν1-based strategy in sample-cost-sensitive settings. Finally, we provide experiments on real-world computer vision datasets to illustrate the effectiveness of our proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-wang23b, title = {Improved Active Multi-Task Representation Learning via Lasso}, author = {Wang, Yiping and Chen, Yifang and Jamieson, Kevin and Du, Simon Shaolei}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {35548--35578}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/wang23b/wang23b.pdf}, url = {https://proceedings.mlr.press/v202/wang23b.html}, abstract = {To leverage the copious amount of data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now, most existing works design a source task selection strategy from a purely empirical perspective. Recently, Chen et al., 2022 gave the first active multi-task representation learning (A-MTRL) algorithm which adaptively samples from source tasks and can provably reduce the total sample complexity using the L2-regularized-target-source-relevance parameter $\nu^2$. But their work is theoretically suboptimal in terms of total source sample complexity and is less practical in some real-world scenarios where sparse training source task selection is desired. In this paper, we address both issues. Specifically, we show the strict dominance of the L1-regularized-relevance-based ($\nu^1$-based) strategy by giving a lower bound for the $\nu^2$-based strategy. When $\nu^1$ is unknown, we propose a practical algorithm that uses the LASSO program to estimate $\nu^1$. Our algorithm successfully recovers the optimal result in the known case. In addition to our sample complexity results, we also characterize the potential of our $\nu^1$-based strategy in sample-cost-sensitive settings. Finally, we provide experiments on real-world computer vision datasets to illustrate the effectiveness of our proposed method.} }
Endnote
%0 Conference Paper %T Improved Active Multi-Task Representation Learning via Lasso %A Yiping Wang %A Yifang Chen %A Kevin Jamieson %A Simon Shaolei Du %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-wang23b %I PMLR %P 35548--35578 %U https://proceedings.mlr.press/v202/wang23b.html %V 202 %X To leverage the copious amount of data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now, most existing works design a source task selection strategy from a purely empirical perspective. Recently, Chen et al., 2022 gave the first active multi-task representation learning (A-MTRL) algorithm which adaptively samples from source tasks and can provably reduce the total sample complexity using the L2-regularized-target-source-relevance parameter $\nu^2$. But their work is theoretically suboptimal in terms of total source sample complexity and is less practical in some real-world scenarios where sparse training source task selection is desired. In this paper, we address both issues. Specifically, we show the strict dominance of the L1-regularized-relevance-based ($\nu^1$-based) strategy by giving a lower bound for the $\nu^2$-based strategy. When $\nu^1$ is unknown, we propose a practical algorithm that uses the LASSO program to estimate $\nu^1$. Our algorithm successfully recovers the optimal result in the known case. In addition to our sample complexity results, we also characterize the potential of our $\nu^1$-based strategy in sample-cost-sensitive settings. Finally, we provide experiments on real-world computer vision datasets to illustrate the effectiveness of our proposed method.
APA
Wang, Y., Chen, Y., Jamieson, K. & Du, S.S.. (2023). Improved Active Multi-Task Representation Learning via Lasso. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35548-35578 Available from https://proceedings.mlr.press/v202/wang23b.html.

Related Material