A Family of Exact GoodnessofFit Tests for HighDimensional Discrete Distributions
[edit]
Proceedings of Machine Learning Research, PMLR 89:16401649, 2019.
Abstract
The objective of goodnessoffit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rankbased family of goodnessoffit tests that is specialized to discrete distributions on highdimensional domains. The test is readily implemented using a simulationbased, lineartime procedure. The testing procedure can be customized by the practitioner using knowledge of the underlying data domain. Unlike most existing test statistics, the proposed test statistic is distributionfree and its exact (nonasymptotic) sampling distribution is known in closed form. We establish consistency of the test against all alternatives by showing that the test statistic is distributed as a discrete uniform if and only if the samples were drawn from the candidate distribution. We illustrate its efficacy for assessing the sample quality of approximate sampling algorithms over combinatorially large spaces with intractable probabilities, including random partitions in Dirichlet process mixture models and random lattices in Ising models.
Related Material


