Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

Mojmir Mutny; Michal Derezinski; Andreas Krause

Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

Mojmir Mutny, Michal Derezinski, Andreas Krause

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3110-3120, 2020.

Abstract

We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix M instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of M. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix M, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of M. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-mutny20a,
  title = 	 {Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling},
  author =       {Mutny, Mojmir and Derezinski, Michal and Krause, Andreas},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3110--3120},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/mutny20a/mutny20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/mutny20a.html},
  abstract = 	 {We analyze the convergence rate of the randomized Newton-like method        introduced by Qu et. al. (2016) for smooth and convex        objectives, which uses random coordinate        blocks of a Hessian-over-approximation matrix        M instead of the true Hessian. The convergence analysis of the algorithm is          challenging because of its complex dependence on the structure        of M. However, we show that when the coordinate blocks are        sampled with probability         proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix M, and has an analytically tractable form. To         do so, we derive a fundamental new expectation formula for        determinantal point processes. We show that determinantal        sampling allows us to reason about the optimal subset size of        blocks in terms of the spectrum of M. Additionally, we        provide a        numerical evaluation of our analysis,        demonstrating cases where determinantal sampling is superior        or on par with uniform sampling.}
}

Endnote

%0 Conference Paper
%T Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling
%A Mojmir Mutny
%A Michal Derezinski
%A Andreas Krause
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-mutny20a
%I PMLR
%P 3110--3120
%U https://proceedings.mlr.press/v108/mutny20a.html
%V 108
%X We analyze the convergence rate of the randomized Newton-like method        introduced by Qu et. al. (2016) for smooth and convex        objectives, which uses random coordinate        blocks of a Hessian-over-approximation matrix        M instead of the true Hessian. The convergence analysis of the algorithm is          challenging because of its complex dependence on the structure        of M. However, we show that when the coordinate blocks are        sampled with probability         proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix M, and has an analytically tractable form. To         do so, we derive a fundamental new expectation formula for        determinantal point processes. We show that determinantal        sampling allows us to reason about the optimal subset size of        blocks in terms of the spectrum of M. Additionally, we        provide a        numerical evaluation of our analysis,        demonstrating cases where determinantal sampling is superior        or on par with uniform sampling.

APA


Mutny, M., Derezinski, M. & Krause, A.. (2020). Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3110-3120 Available from https://proceedings.mlr.press/v108/mutny20a.html.

Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

Abstract

Cite this Paper

Related Material