[edit]
Regression Learning with Limited Observations of Multivariate Outcomes and Features
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47174-47197, 2024.
Abstract
Multivariate linear regression models are broadly used to facilitate relationships between outcomes and features. However, their effectiveness is compromised by the presence of missing observations, a ubiquitous challenge in real-world applications. Considering a scenario where learners access only limited components for both outcomes and features, we develop efficient algorithms tailored for the least squares ($L_2$) and least absolute ($L_1$) loss functions, each coupled with a ridge-like and Lasso-type penalty, respectively. Moreover, we establish rigorous error bounds for all proposed algorithms. Notably, our $L_2$ loss function algorithms are probably approximately correct (PAC), distinguishing them from their $L_1$ counterparts. Extensive numerical experiments show that our approach outperforms methods that apply existing algorithms for univariate outcome individually to each coordinate of multivariate outcomes in a naive manner. Further, utilizing the $L_1$ loss function or introducing a Lasso-type penalty can enhance predictions in the presence of outliers or high dimensional features. This research contributes valuable insights into addressing the challenges posed by incomplete data.