Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering

Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat
Proceedings of the Time Series Workshop at NIPS 2016, PMLR 55:59-69, 2017.

Abstract

We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.

Cite this Paper


BibTeX
@InProceedings{pmlr-v55-marti16, title = {Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering}, author = {Marti, Gautier and Andler, Sébastien and Nielsen, Frank and Donnat, Philippe}, booktitle = {Proceedings of the Time Series Workshop at NIPS 2016}, pages = {59--69}, year = {2017}, editor = {Anava, Oren and Khaleghi, Azadeh and Cuturi, Marco and Kuznetsov, Vitaly and Rakhlin, Alexander}, volume = {55}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {09 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v55/marti16.pdf}, url = {https://proceedings.mlr.press/v55/marti16.html}, abstract = {We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.} }
Endnote
%0 Conference Paper %T Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering %A Gautier Marti %A Sébastien Andler %A Frank Nielsen %A Philippe Donnat %B Proceedings of the Time Series Workshop at NIPS 2016 %C Proceedings of Machine Learning Research %D 2017 %E Oren Anava %E Azadeh Khaleghi %E Marco Cuturi %E Vitaly Kuznetsov %E Alexander Rakhlin %F pmlr-v55-marti16 %I PMLR %P 59--69 %U https://proceedings.mlr.press/v55/marti16.html %V 55 %X We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.
RIS
TY - CPAPER TI - Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering AU - Gautier Marti AU - Sébastien Andler AU - Frank Nielsen AU - Philippe Donnat BT - Proceedings of the Time Series Workshop at NIPS 2016 DA - 2017/02/16 ED - Oren Anava ED - Azadeh Khaleghi ED - Marco Cuturi ED - Vitaly Kuznetsov ED - Alexander Rakhlin ID - pmlr-v55-marti16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 55 SP - 59 EP - 69 L1 - http://proceedings.mlr.press/v55/marti16.pdf UR - https://proceedings.mlr.press/v55/marti16.html AB - We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research. ER -
APA
Marti, G., Andler, S., Nielsen, F. & Donnat, P.. (2017). Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering. Proceedings of the Time Series Workshop at NIPS 2016, in Proceedings of Machine Learning Research 55:59-69 Available from https://proceedings.mlr.press/v55/marti16.html.

Related Material