[edit]
Kernel Thinning
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:1753-1753, 2021.
Abstract
We introduce kernel thinning, a new procedure for compressing a distribution P more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n2) time, kernel thinning compresses an n-point approximation to P into a √n-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is Od(n−12√logn) for compactly supported P and Od(n−12√(logn)d+1loglogn) for sub-exponential P on Rd. In contrast, an equal-sized i.i.d. sample from P suffers Ω(n−14) integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform P on [0,1]d but apply to general distributions on Rd and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning.