On Ranking-based Tests of Independence

Myrto Limnios, Stéphan Clémençon
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:577-585, 2024.

Abstract

In this paper we develop a novel nonparametric framework to test the independence of two random variables $X$ and $Y$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dxdy)$, based on Receiver Operating Characteristic (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis $\mathcal{H}_0$ is necessarily false as soon as the optimal scoring function related to the pair of distributions $(H\otimes G,;{F})$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square. We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption $\mathcal{H}_0$, even in high dimension, as supported by the numerical experiments presented here.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-limnios24a, title = {On Ranking-based Tests of Independence}, author = {Limnios, Myrto and Cl\'{e}men\c{c}on, St\'{e}phan}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {577--585}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/limnios24a/limnios24a.pdf}, url = {https://proceedings.mlr.press/v238/limnios24a.html}, abstract = {In this paper we develop a novel nonparametric framework to test the independence of two random variables $X$ and $Y$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dxdy)$, based on Receiver Operating Characteristic (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis $\mathcal{H}_0$ is necessarily false as soon as the optimal scoring function related to the pair of distributions $(H\otimes G,;{F})$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square. We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption $\mathcal{H}_0$, even in high dimension, as supported by the numerical experiments presented here.} }
Endnote
%0 Conference Paper %T On Ranking-based Tests of Independence %A Myrto Limnios %A Stéphan Clémençon %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-limnios24a %I PMLR %P 577--585 %U https://proceedings.mlr.press/v238/limnios24a.html %V 238 %X In this paper we develop a novel nonparametric framework to test the independence of two random variables $X$ and $Y$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dxdy)$, based on Receiver Operating Characteristic (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis $\mathcal{H}_0$ is necessarily false as soon as the optimal scoring function related to the pair of distributions $(H\otimes G,;{F})$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square. We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption $\mathcal{H}_0$, even in high dimension, as supported by the numerical experiments presented here.
APA
Limnios, M. & Clémençon, S.. (2024). On Ranking-based Tests of Independence. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:577-585 Available from https://proceedings.mlr.press/v238/limnios24a.html.

Related Material