Optimal Approximation of Doubly Stochastic Matrices
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3589-3598, 2020.
We consider the least-squares approximation of a matrix C in the set of doubly stochastic matrices with the same sparsity pattern as C. Our approach is based on applying the well-known Alternating Direction Method of Multipliers (ADMM) to a reformulation of the original problem. Our resulting algorithm requires an initial Cholesky factorization of a positive definite matrix that has the same sparsity pattern as C + I followed by simple iterations whose complexity is linear in the number of nonzeros in C, thus ensuring excellent scalability and speed. We demonstrate the advantages of our approach in a series of experiments on problems with up to 82 million nonzeros; these include normalizing large scale matrices arising from the 3D structure of the human genome, clustering applications, and the SuiteSparse matrix library. Overall, our experiments illustrate the outstanding scalability of our algorithm; matrices with millions of nonzeros can be approximated in a few seconds on modest desktop computing hardware.