Matrix Norms in Data Streams: Faster, MultiPass and RowOrder
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:649658, 2018.
Abstract
A central problem in mining massive data streams is characterizing which functions of an underlying frequency vector can be approximated efficiently. Given the prevalence of large scale linear algebra problems in machine learning, recently there has been considerable effort in extending this data stream problem to that of estimating functions of a matrix. This setting generalizes classical problems to the analogous ones for matrices. For example, instead of estimating frequentitem counts, we now wish to estimate “frequentdirection” counts. A related example is to estimate norms, which now correspond to estimating a vector norm on the singular values of the matrix. Despite recent efforts, the current understanding for such matrix problems is considerably weaker than that for vector problems. We study a number of aspects of estimating matrix norms in a stream that have not previously been considered: (1) multipass algorithms, (2) algorithms that see the underlying matrix one row at a time, and (3) timeefficient algorithms. Our multipass and roworder algorithms use less memory than what is provably required in the singlepass and entrywiseupdate models, and thus give separations between these models (in terms of memory). Moreover, all of our algorithms are considerably faster than previous ones. We also prove a number of lower bounds, and obtain for instance, a nearcomplete characterization of the memory required of roworder algorithms for estimating Schatten $p$norms of sparse matrices. We complement our results with numerical experiments.
Related Material


