Marginal Regression For Multitask Learning


Mladen Kolar, Han Liu ;
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:647-655, 2012.


Variable selection is an important practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures, that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is the marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of the marginal regression in a multitask framework. Our contribution include: i) sharp analysis for the marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of the marginal regression.

Related Material