Fast Dynamic Sampling for Determinantal Point Processes

Zhao Song; Junze Yin; Lichen Zhang; Ruizhe Zhang

Fast Dynamic Sampling for Determinantal Point Processes

Zhao Song, Junze Yin, Lichen Zhang, Ruizhe Zhang

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:244-252, 2024.

Abstract

n this work, we provide fast dynamic algorithms for repeatedly sampling from distributions characterized by Determinantal Point Processes (DPPs) and Nonsymmetric Determinantal Point Processes (NDPPs). DPPs are a very well-studied class of distributions on subsets of items drawn from a ground set of cardinality $n$ characterized by a symmetric $n \times n$ kernel matrix $L$ such that the probability of any subset is proportional to the determinant of its corresponding principal submatrix. Recent work has shown that the kernel symmetry constraint can be relaxed, leading to NDPPs, which can better model data in several machine learning applications. Given a low-rank kernel matrix ${\cal L}=L+L^\top\in \mathbb{R}^{n\times n}$ and its corresponding eigendecomposition specified by $\{\lambda_i, u_i \}_{i=1}^d$ where $d\leq n$ is the rank, we design a data structure that uses $O(nd)$ space and preprocesses data in $O(nd^{\omega-1})$ time where $\omega\approx 2.37$ is the exponent of matrix multiplication. The data structure can generate a sample according to DPP distribution in time $O(|E|^3\log n+|E|^{\omega-1}d^2)$ or according to NDPP distribution in time $O((|E|^3 \log n+ |E|^{\omega-1}d^2)(1+w)^d)$ for $E$ being the sampled indices and $w$ is a data-dependent parameter. This improves upon the space and preprocessing time over prior works, and achieves a state-of-the-art sampling time when the sampling set is relatively dense. At the heart of our data structure is an efficient sampling tree that can leverage batch initialization and fast inner product query simultaneously.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-song24b,
  title = 	 {Fast Dynamic Sampling for Determinantal Point Processes},
  author =       {Song, Zhao and Yin, Junze and Zhang, Lichen and Zhang, Ruizhe},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {244--252},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/song24b/song24b.pdf},
  url = 	 {https://proceedings.mlr.press/v238/song24b.html},
  abstract = 	 {n this work, we provide fast dynamic algorithms for repeatedly sampling from distributions characterized by Determinantal Point Processes (DPPs) and Nonsymmetric Determinantal Point Processes (NDPPs). DPPs are a very well-studied class of distributions on subsets of items drawn from a ground set of cardinality $n$ characterized by a symmetric $n \times n$ kernel matrix $L$ such that the probability of any subset is proportional to the determinant of its corresponding principal submatrix. Recent work has shown that the kernel symmetry constraint can be relaxed, leading to NDPPs, which can better model data in several machine learning applications. Given a low-rank kernel matrix ${\cal L}=L+L^\top\in \mathbb{R}^{n\times n}$ and its corresponding eigendecomposition specified by $\{\lambda_i, u_i \}_{i=1}^d$ where $d\leq n$ is the rank, we design a data structure that uses $O(nd)$ space and preprocesses data in $O(nd^{\omega-1})$ time where $\omega\approx 2.37$ is the exponent of matrix multiplication. The data structure can generate a sample according to DPP distribution in time $O(|E|^3\log n+|E|^{\omega-1}d^2)$ or according to NDPP distribution in time $O((|E|^3 \log n+ |E|^{\omega-1}d^2)(1+w)^d)$ for $E$ being the sampled indices and $w$ is a data-dependent parameter. This improves upon the space and preprocessing time over prior works, and achieves a state-of-the-art sampling time when the sampling set is relatively dense. At the heart of our data structure is an efficient sampling tree that can leverage batch initialization and fast inner product query simultaneously.}
}

Endnote

%0 Conference Paper
%T Fast Dynamic Sampling for Determinantal Point Processes
%A Zhao Song
%A Junze Yin
%A Lichen Zhang
%A Ruizhe Zhang
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-song24b
%I PMLR
%P 244--252
%U https://proceedings.mlr.press/v238/song24b.html
%V 238
%X n this work, we provide fast dynamic algorithms for repeatedly sampling from distributions characterized by Determinantal Point Processes (DPPs) and Nonsymmetric Determinantal Point Processes (NDPPs). DPPs are a very well-studied class of distributions on subsets of items drawn from a ground set of cardinality $n$ characterized by a symmetric $n \times n$ kernel matrix $L$ such that the probability of any subset is proportional to the determinant of its corresponding principal submatrix. Recent work has shown that the kernel symmetry constraint can be relaxed, leading to NDPPs, which can better model data in several machine learning applications. Given a low-rank kernel matrix ${\cal L}=L+L^\top\in \mathbb{R}^{n\times n}$ and its corresponding eigendecomposition specified by $\{\lambda_i, u_i \}_{i=1}^d$ where $d\leq n$ is the rank, we design a data structure that uses $O(nd)$ space and preprocesses data in $O(nd^{\omega-1})$ time where $\omega\approx 2.37$ is the exponent of matrix multiplication. The data structure can generate a sample according to DPP distribution in time $O(|E|^3\log n+|E|^{\omega-1}d^2)$ or according to NDPP distribution in time $O((|E|^3 \log n+ |E|^{\omega-1}d^2)(1+w)^d)$ for $E$ being the sampled indices and $w$ is a data-dependent parameter. This improves upon the space and preprocessing time over prior works, and achieves a state-of-the-art sampling time when the sampling set is relatively dense. At the heart of our data structure is an efficient sampling tree that can leverage batch initialization and fast inner product query simultaneously.

APA

Song, Z., Yin, J., Zhang, L. & Zhang, R.. (2024). Fast Dynamic Sampling for Determinantal Point Processes. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:244-252 Available from https://proceedings.mlr.press/v238/song24b.html.

Related Material

Download PDF