Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering

Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat
; Proceedings of the Time Series Workshop at NIPS 2016, PMLR 55:59-69, 2017.

Abstract

We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.

Cite this Paper


BibTeX
@InProceedings{pmlr-v55-marti16, title = {Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering}, author = {Gautier Marti and Sébastien Andler and Frank Nielsen and Philippe Donnat}, booktitle = {Proceedings of the Time Series Workshop at NIPS 2016}, pages = {59--69}, year = {2017}, editor = {Oren Anava and Azadeh Khaleghi and Marco Cuturi and Vitaly Kuznetsov and Alexander Rakhlin}, volume = {55}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {09 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v55/marti16.pdf}, url = {http://proceedings.mlr.press/v55/marti16.html}, abstract = {We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.} }
Endnote
%0 Conference Paper %T Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering %A Gautier Marti %A Sébastien Andler %A Frank Nielsen %A Philippe Donnat %B Proceedings of the Time Series Workshop at NIPS 2016 %C Proceedings of Machine Learning Research %D 2017 %E Oren Anava %E Azadeh Khaleghi %E Marco Cuturi %E Vitaly Kuznetsov %E Alexander Rakhlin %F pmlr-v55-marti16 %I PMLR %J Proceedings of Machine Learning Research %P 59--69 %U http://proceedings.mlr.press %V 55 %W PMLR %X We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research.
RIS
TY - CPAPER TI - Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering AU - Gautier Marti AU - Sébastien Andler AU - Frank Nielsen AU - Philippe Donnat BT - Proceedings of the Time Series Workshop at NIPS 2016 PY - 2017/02/16 DA - 2017/02/16 ED - Oren Anava ED - Azadeh Khaleghi ED - Marco Cuturi ED - Vitaly Kuznetsov ED - Alexander Rakhlin ID - pmlr-v55-marti16 PB - PMLR SP - 59 DP - PMLR EP - 69 L1 - http://proceedings.mlr.press/v55/marti16.pdf UR - http://proceedings.mlr.press/v55/marti16.html AB - We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online at https://www.datagrapple.com/Tech for reproducible research. ER -
APA
Marti, G., Andler, S., Nielsen, F. & Donnat, P.. (2017). Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering. Proceedings of the Time Series Workshop at NIPS 2016, in PMLR 55:59-69

Related Material