Collaborative non-parametric two-sample testing

Alejandro David De la Concha Duarte, Nicolas Vayatis, Argyris Kalogeratos
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:838-846, 2025.

Abstract

Multiple two-sample test problem in a graph-structured setting is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions, $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. CTST integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-concha-duarte25a, title = {Collaborative non-parametric two-sample testing}, author = {la Concha Duarte, Alejandro David De and Vayatis, Nicolas and Kalogeratos, Argyris}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {838--846}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/concha-duarte25a/concha-duarte25a.pdf}, url = {https://proceedings.mlr.press/v258/concha-duarte25a.html}, abstract = {Multiple two-sample test problem in a graph-structured setting is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions, $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. CTST integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem.} }
Endnote
%0 Conference Paper %T Collaborative non-parametric two-sample testing %A Alejandro David De la Concha Duarte %A Nicolas Vayatis %A Argyris Kalogeratos %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-concha-duarte25a %I PMLR %P 838--846 %U https://proceedings.mlr.press/v258/concha-duarte25a.html %V 258 %X Multiple two-sample test problem in a graph-structured setting is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions, $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. CTST integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem.
APA
la Concha Duarte, A.D.D., Vayatis, N. & Kalogeratos, A.. (2025). Collaborative non-parametric two-sample testing. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:838-846 Available from https://proceedings.mlr.press/v258/concha-duarte25a.html.

Related Material