Marginal Regression For Multitask Learning

Mladen Kolar, Han Liu
; Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:647-655, 2012.

Abstract

Variable selection is an important practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures, that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is the marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of the marginal regression in a multitask framework. Our contribution include: i) sharp analysis for the marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of the marginal regression.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-kolar12, title = {Marginal Regression For Multitask Learning}, author = {Mladen Kolar and Han Liu}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {647--655}, year = {2012}, editor = {Neil D. Lawrence and Mark Girolami}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/kolar12/kolar12.pdf}, url = {http://proceedings.mlr.press/v22/kolar12.html}, abstract = {Variable selection is an important practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures, that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is the marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of the marginal regression in a multitask framework. Our contribution include: i) sharp analysis for the marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of the marginal regression.} }
Endnote
%0 Conference Paper %T Marginal Regression For Multitask Learning %A Mladen Kolar %A Han Liu %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-kolar12 %I PMLR %J Proceedings of Machine Learning Research %P 647--655 %U http://proceedings.mlr.press %V 22 %W PMLR %X Variable selection is an important practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures, that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is the marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of the marginal regression in a multitask framework. Our contribution include: i) sharp analysis for the marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of the marginal regression.
RIS
TY - CPAPER TI - Marginal Regression For Multitask Learning AU - Mladen Kolar AU - Han Liu BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics PY - 2012/03/21 DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-kolar12 PB - PMLR SP - 647 DP - PMLR EP - 655 L1 - http://proceedings.mlr.press/v22/kolar12/kolar12.pdf UR - http://proceedings.mlr.press/v22/kolar12.html AB - Variable selection is an important practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures, that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is the marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of the marginal regression in a multitask framework. Our contribution include: i) sharp analysis for the marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of the marginal regression. ER -
APA
Kolar, M. & Liu, H.. (2012). Marginal Regression For Multitask Learning. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in PMLR 22:647-655

Related Material