Low-Rank Spectral Learning
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:522-530, 2014.
Spectral learning methods have recently been proposed as alternatives to slow, non-convex optimization algorithms like EM for a variety of probabilistic models in which hidden information must be inferred by the learner. These methods are typically controlled by a rank hyperparameter that sets the complexity of the model; when the model rank matches the true rank of the process generating the data, the resulting predictions are provably consistent and admit finite sample convergence bounds. However, in practice we usually do not know the true rank, and, in any event, from a computational and statistical standpoint it is likely to be prohibitively large. It is therefore of great practical interest to understand the behavior of low-rank spectral learning, where the model rank is less than the true rank. Counterintuitively, we show that even when the singular values omitted by lowering the rank are arbitrarily small, the resulting prediction errors can in fact be arbitrarily large. We identify two distinct possible causes for this bad behavior, and illustrate them with simple examples. We then show that these two causes are essentially complete: assuming that they do not occur, we can prove that the prediction error is bounded in terms of the magnitudes of the omitted singular values. We argue that the assumptions necessary for this result are relatively realistic, making low-rank spectral learning a viable option for many applications.