Fast determinantal point processes via distortion-free intermediate sampling

Michał Dereziński

Fast determinantal point processes via distortion-free intermediate sampling

Michał Dereziński

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1029-1049, 2019.

Abstract

Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O\big(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n\big)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new \textit{regularized} determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-derezinski19a,
  title = 	 {Fast determinantal point processes via distortion-free intermediate sampling},
  author =       {Derezi{\'n}ski, Micha{\l}},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {1029--1049},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/derezinski19a/derezinski19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/derezinski19a.html},
  abstract = 	 { Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a  distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the  sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O\big(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n\big)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new \textit{regularized} determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP  is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as  machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.}
}

Endnote

%0 Conference Paper
%T Fast determinantal point processes via distortion-free intermediate sampling
%A Michał Dereziński
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-derezinski19a
%I PMLR
%P 1029--1049
%U https://proceedings.mlr.press/v99/derezinski19a.html
%V 99
%X  Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a  distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the  sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O\big(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n\big)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new \textit{regularized} determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP  is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as  machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.

APA


Dereziński, M.. (2019). Fast determinantal point processes via distortion-free intermediate sampling. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:1029-1049 Available from https://proceedings.mlr.press/v99/derezinski19a.html.

Related Material

Download PDF