A low variance consistent test of relative dependency

Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:20-29, 2015.

Abstract

We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-bounliphone15, title = {A low variance consistent test of relative dependency}, author = {Bounliphone, Wacha and Gretton, Arthur and Tenenhaus, Arthur and Blaschko, Matthew}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {20--29}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/bounliphone15.pdf}, url = {https://proceedings.mlr.press/v37/bounliphone15.html}, abstract = {We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep/.} }
Endnote
%0 Conference Paper %T A low variance consistent test of relative dependency %A Wacha Bounliphone %A Arthur Gretton %A Arthur Tenenhaus %A Matthew Blaschko %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-bounliphone15 %I PMLR %P 20--29 %U https://proceedings.mlr.press/v37/bounliphone15.html %V 37 %X We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep/.
RIS
TY - CPAPER TI - A low variance consistent test of relative dependency AU - Wacha Bounliphone AU - Arthur Gretton AU - Arthur Tenenhaus AU - Matthew Blaschko BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-bounliphone15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 20 EP - 29 L1 - http://proceedings.mlr.press/v37/bounliphone15.pdf UR - https://proceedings.mlr.press/v37/bounliphone15.html AB - We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep/. ER -
APA
Bounliphone, W., Gretton, A., Tenenhaus, A. & Blaschko, M.. (2015). A low variance consistent test of relative dependency. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:20-29 Available from https://proceedings.mlr.press/v37/bounliphone15.html.

Related Material