Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity

Mohsen Ferdosi, Arash Gholamidavoodi, Hosein Mohimani
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4399-4409, 2020.

Abstract

Finding associations between pairs of variables in large datasets is crucial for various disciplines. The brute force method for solving this problem requires computing the mutual information between $\binom{N}{2}$ pairs. In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity. This problem is analogous to the nearest neighbor search, where the goal is to find pairs among $N$ variables that are similar to each other. To solve this problem, we develop a new algorithm for finding associations based on constructing a decision tree that assigns a hash to each variable, in a way that for pairs with higher mutual information, the chance of having the same hash is higher. For any $1 \leq \lambda \leq 2$, we prove that in the case of binary data, we can reduce the number of necessary mutual information computations for finding all pairs satisfying $I(X, Y) > 2- \lambda$ from $O(N^2)$ to $O(N^\lambda)$, where $I(X,Y)$ is the empirical mutual information between variables $X$ and $Y$. Finally, we confirmed our theory by experiments on simulated and real data. The implementation of our method and experiments is publicly available at \href{https://github.com/mohimanilab/HashMI}{https://github.com/mohimanilab/HashMI}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-ferdosi20a, title = {Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity}, author = {Ferdosi, Mohsen and Gholamidavoodi, Arash and Mohimani, Hosein}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4399--4409}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/ferdosi20a/ferdosi20a.pdf}, url = { http://proceedings.mlr.press/v108/ferdosi20a.html }, abstract = {Finding associations between pairs of variables in large datasets is crucial for various disciplines. The brute force method for solving this problem requires computing the mutual information between $\binom{N}{2}$ pairs. In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity. This problem is analogous to the nearest neighbor search, where the goal is to find pairs among $N$ variables that are similar to each other. To solve this problem, we develop a new algorithm for finding associations based on constructing a decision tree that assigns a hash to each variable, in a way that for pairs with higher mutual information, the chance of having the same hash is higher. For any $1 \leq \lambda \leq 2$, we prove that in the case of binary data, we can reduce the number of necessary mutual information computations for finding all pairs satisfying $I(X, Y) > 2- \lambda$ from $O(N^2)$ to $O(N^\lambda)$, where $I(X,Y)$ is the empirical mutual information between variables $X$ and $Y$. Finally, we confirmed our theory by experiments on simulated and real data. The implementation of our method and experiments is publicly available at \href{https://github.com/mohimanilab/HashMI}{https://github.com/mohimanilab/HashMI}.} }
Endnote
%0 Conference Paper %T Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity %A Mohsen Ferdosi %A Arash Gholamidavoodi %A Hosein Mohimani %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-ferdosi20a %I PMLR %P 4399--4409 %U http://proceedings.mlr.press/v108/ferdosi20a.html %V 108 %X Finding associations between pairs of variables in large datasets is crucial for various disciplines. The brute force method for solving this problem requires computing the mutual information between $\binom{N}{2}$ pairs. In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity. This problem is analogous to the nearest neighbor search, where the goal is to find pairs among $N$ variables that are similar to each other. To solve this problem, we develop a new algorithm for finding associations based on constructing a decision tree that assigns a hash to each variable, in a way that for pairs with higher mutual information, the chance of having the same hash is higher. For any $1 \leq \lambda \leq 2$, we prove that in the case of binary data, we can reduce the number of necessary mutual information computations for finding all pairs satisfying $I(X, Y) > 2- \lambda$ from $O(N^2)$ to $O(N^\lambda)$, where $I(X,Y)$ is the empirical mutual information between variables $X$ and $Y$. Finally, we confirmed our theory by experiments on simulated and real data. The implementation of our method and experiments is publicly available at \href{https://github.com/mohimanilab/HashMI}{https://github.com/mohimanilab/HashMI}.
APA
Ferdosi, M., Gholamidavoodi, A. & Mohimani, H.. (2020). Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4399-4409 Available from http://proceedings.mlr.press/v108/ferdosi20a.html .

Related Material