Two-way Sparse Network Inference for Count Data

Sijia Li, Martı́n López-Garcı́a, Neil D. Lawrence, Luisa Cutillo
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10924-10938, 2022.

Abstract

Classically, statistical datasets have a larger number of data points than features ($n > p$). The standard model of classical statistics caters for the case where data points are considered conditionally independent given the parameters. However, for $n \approx p$ or $p > n$ such models are poorly determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, an estimator for sparse precision matrices based on the Cartesian product of graphs. Unfortunately, the original Bigraphical Lasso algorithm is not applicable in case of large $p$ and $n$ due to memory requirements. We exploit eigenvalue decomposition of the Cartesian product graph to present a more efficient version of the algorithm which reduces memory requirements from $O(n^2p^2)$ to $O(n^2 +p^2)$. Many datasets in different application fields, such as biology, medicine and social science, come with count data, for which Gaussian based models are not applicable. Our multiway network inference approach can be used for discrete data. Our methodology accounts for the dependencies across both instances and features, reduces the computational complexity for high dimensional data and enables to deal with both discrete and continuous data. Numerical studies on both synthetic and real datasets are presented to showcase the performance of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-li22g, title = { Two-way Sparse Network Inference for Count Data }, author = {Li, Sijia and L\'opez-Garc{\'\i}a, Mart{\'\i}n and Lawrence, Neil D. and Cutillo, Luisa}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {10924--10938}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/li22g/li22g.pdf}, url = {https://proceedings.mlr.press/v151/li22g.html}, abstract = { Classically, statistical datasets have a larger number of data points than features ($n > p$). The standard model of classical statistics caters for the case where data points are considered conditionally independent given the parameters. However, for $n \approx p$ or $p > n$ such models are poorly determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, an estimator for sparse precision matrices based on the Cartesian product of graphs. Unfortunately, the original Bigraphical Lasso algorithm is not applicable in case of large $p$ and $n$ due to memory requirements. We exploit eigenvalue decomposition of the Cartesian product graph to present a more efficient version of the algorithm which reduces memory requirements from $O(n^2p^2)$ to $O(n^2 +p^2)$. Many datasets in different application fields, such as biology, medicine and social science, come with count data, for which Gaussian based models are not applicable. Our multiway network inference approach can be used for discrete data. Our methodology accounts for the dependencies across both instances and features, reduces the computational complexity for high dimensional data and enables to deal with both discrete and continuous data. Numerical studies on both synthetic and real datasets are presented to showcase the performance of our method. } }
Endnote
%0 Conference Paper %T Two-way Sparse Network Inference for Count Data %A Sijia Li %A Martı́n López-Garcı́a %A Neil D. Lawrence %A Luisa Cutillo %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-li22g %I PMLR %P 10924--10938 %U https://proceedings.mlr.press/v151/li22g.html %V 151 %X Classically, statistical datasets have a larger number of data points than features ($n > p$). The standard model of classical statistics caters for the case where data points are considered conditionally independent given the parameters. However, for $n \approx p$ or $p > n$ such models are poorly determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, an estimator for sparse precision matrices based on the Cartesian product of graphs. Unfortunately, the original Bigraphical Lasso algorithm is not applicable in case of large $p$ and $n$ due to memory requirements. We exploit eigenvalue decomposition of the Cartesian product graph to present a more efficient version of the algorithm which reduces memory requirements from $O(n^2p^2)$ to $O(n^2 +p^2)$. Many datasets in different application fields, such as biology, medicine and social science, come with count data, for which Gaussian based models are not applicable. Our multiway network inference approach can be used for discrete data. Our methodology accounts for the dependencies across both instances and features, reduces the computational complexity for high dimensional data and enables to deal with both discrete and continuous data. Numerical studies on both synthetic and real datasets are presented to showcase the performance of our method.
APA
Li, S., López-Garcı́a, M., Lawrence, N.D. & Cutillo, L.. (2022). Two-way Sparse Network Inference for Count Data . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:10924-10938 Available from https://proceedings.mlr.press/v151/li22g.html.

Related Material