A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Kevin H. Huang, Xing Liu, Andrew Duncan, Axel Gandy
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3827-3918, 2023.

Abstract

We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-huang23a, title = {A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing}, author = {Huang, Kevin H. and Liu, Xing and Duncan, Andrew and Gandy, Axel}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {3827--3918}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/huang23a/huang23a.pdf}, url = {https://proceedings.mlr.press/v195/huang23a.html}, abstract = {We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.} }
Endnote
%0 Conference Paper %T A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing %A Kevin H. Huang %A Xing Liu %A Andrew Duncan %A Axel Gandy %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-huang23a %I PMLR %P 3827--3918 %U https://proceedings.mlr.press/v195/huang23a.html %V 195 %X We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.
APA
Huang, K.H., Liu, X., Duncan, A. & Gandy, A.. (2023). A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3827-3918 Available from https://proceedings.mlr.press/v195/huang23a.html.

Related Material