A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Kevin H. Huang; Xing Liu; Andrew Duncan; Axel Gandy

A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Kevin H. Huang, Xing Liu, Andrew Duncan, Axel Gandy

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3827-3918, 2023.

Abstract

We prove a convergence theorem for U-statistics of degree two, where the data dimension

$d$ is allowed to scale with sample size

$n$ . We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite

$n$ and

$d$ , independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with

$d$ and the bandwidth.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-huang23a,
  title = 	 {A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing},
  author =       {Huang, Kevin H. and Liu, Xing and Duncan, Andrew and Gandy, Axel},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {3827--3918},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/huang23a/huang23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/huang23a.html},
  abstract = 	 {We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.}
}

Endnote

%0 Conference Paper
%T A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing
%A Kevin H. Huang
%A Xing Liu
%A Andrew Duncan
%A Axel Gandy
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-huang23a
%I PMLR
%P 3827--3918
%U https://proceedings.mlr.press/v195/huang23a.html
%V 195
%X We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.

APA


Huang, K.H., Liu, X., Duncan, A. & Gandy, A.. (2023). A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3827-3918 Available from https://proceedings.mlr.press/v195/huang23a.html.

Related Material

Download PDF