A Faster Sampler for Discrete Determinantal Point Processes

Simon Barthelmé, Nicolas Tremblay, Pierre-Olivier Amblard
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5582-5592, 2023.

Abstract

Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as $O(n^3)$ where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as $O(np^2 + nm^2)$ where m is the (average) number of samples of the DPP (usually m $\ll$ n) and p the rank of the kernel used to define the DPP (m $\leq$ p $\leq$ n). The first term, $O(np^2)$, comes from a SVD-like step. We focus here on the second term of this cost, $O(nm^2)$, and show that it can be brought down to $O(nm + m^3 log m)$ without loss on the sampling’s exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n $>$ 1, 000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time $O(m^3 log m)$. Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size $O(m \log m)$ formed using leverage score i.i.d. sampling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-barthelme23a, title = {A Faster Sampler for Discrete Determinantal Point Processes}, author = {Barthelm\'e, Simon and Tremblay, Nicolas and Amblard, Pierre-Olivier}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {5582--5592}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/barthelme23a/barthelme23a.pdf}, url = {https://proceedings.mlr.press/v206/barthelme23a.html}, abstract = {Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as $O(n^3)$ where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as $O(np^2 + nm^2)$ where m is the (average) number of samples of the DPP (usually m $\ll$ n) and p the rank of the kernel used to define the DPP (m $\leq$ p $\leq$ n). The first term, $O(np^2)$, comes from a SVD-like step. We focus here on the second term of this cost, $O(nm^2)$, and show that it can be brought down to $O(nm + m^3 log m)$ without loss on the sampling’s exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n $>$ 1, 000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time $O(m^3 log m)$. Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size $O(m \log m)$ formed using leverage score i.i.d. sampling.} }
Endnote
%0 Conference Paper %T A Faster Sampler for Discrete Determinantal Point Processes %A Simon Barthelmé %A Nicolas Tremblay %A Pierre-Olivier Amblard %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-barthelme23a %I PMLR %P 5582--5592 %U https://proceedings.mlr.press/v206/barthelme23a.html %V 206 %X Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as $O(n^3)$ where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as $O(np^2 + nm^2)$ where m is the (average) number of samples of the DPP (usually m $\ll$ n) and p the rank of the kernel used to define the DPP (m $\leq$ p $\leq$ n). The first term, $O(np^2)$, comes from a SVD-like step. We focus here on the second term of this cost, $O(nm^2)$, and show that it can be brought down to $O(nm + m^3 log m)$ without loss on the sampling’s exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n $>$ 1, 000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time $O(m^3 log m)$. Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size $O(m \log m)$ formed using leverage score i.i.d. sampling.
APA
Barthelmé, S., Tremblay, N. & Amblard, P.. (2023). A Faster Sampler for Discrete Determinantal Point Processes. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5582-5592 Available from https://proceedings.mlr.press/v206/barthelme23a.html.

Related Material