Column Subset Selection with Missing Data via Active Sampling

Yining Wang, Aarti Singh
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:1033-1041, 2015.

Abstract

Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-wang15c, title = {{Column Subset Selection with Missing Data via Active Sampling}}, author = {Yining Wang and Aarti Singh}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {1033--1041}, year = {2015}, editor = {Guy Lebanon and S. V. N. Vishwanathan}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/wang15c.pdf}, url = { http://proceedings.mlr.press/v38/wang15c.html }, abstract = {Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound.} }
Endnote
%0 Conference Paper %T Column Subset Selection with Missing Data via Active Sampling %A Yining Wang %A Aarti Singh %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-wang15c %I PMLR %P 1033--1041 %U http://proceedings.mlr.press/v38/wang15c.html %V 38 %X Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound.
RIS
TY - CPAPER TI - Column Subset Selection with Missing Data via Active Sampling AU - Yining Wang AU - Aarti Singh BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-wang15c PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 1033 EP - 1041 L1 - http://proceedings.mlr.press/v38/wang15c.pdf UR - http://proceedings.mlr.press/v38/wang15c.html AB - Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound. ER -
APA
Wang, Y. & Singh, A.. (2015). Column Subset Selection with Missing Data via Active Sampling. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:1033-1041 Available from http://proceedings.mlr.press/v38/wang15c.html .

Related Material