Optimal partition recovery in general graphs

Yi Yu, Oscar Madrid, Alessandro Rinaldo
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:4339-4358, 2022.

Abstract

We consider a graph-structured change point problem in which we observe a random vector with piece-wise constant but otherwise unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector or, equivalently, of estimating the cut separating the sub-graphs over which the mean remains constant. Although graph-valued signals of this type have been previously studied in the literature for the different tasks of testing for the presence of an anomalous cluster and of estimating the mean vector, no localisation results are known outside the classical case of chain graphs. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $\sigma^2$, the size $\Delta$ of the smaller element of the partition, the magnitude $\kappa$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut – a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim \zeta_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $\zeta_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error – measured by the symmetric difference between the true and estimated partition – of order $ \kappa^{-2} \sigma^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-yu22b, title = { Optimal partition recovery in general graphs }, author = {Yu, Yi and Madrid, Oscar and Rinaldo, Alessandro}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {4339--4358}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/yu22b/yu22b.pdf}, url = {https://proceedings.mlr.press/v151/yu22b.html}, abstract = { We consider a graph-structured change point problem in which we observe a random vector with piece-wise constant but otherwise unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector or, equivalently, of estimating the cut separating the sub-graphs over which the mean remains constant. Although graph-valued signals of this type have been previously studied in the literature for the different tasks of testing for the presence of an anomalous cluster and of estimating the mean vector, no localisation results are known outside the classical case of chain graphs. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $\sigma^2$, the size $\Delta$ of the smaller element of the partition, the magnitude $\kappa$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut – a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim \zeta_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $\zeta_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error – measured by the symmetric difference between the true and estimated partition – of order $ \kappa^{-2} \sigma^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes. } }
Endnote
%0 Conference Paper %T Optimal partition recovery in general graphs %A Yi Yu %A Oscar Madrid %A Alessandro Rinaldo %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-yu22b %I PMLR %P 4339--4358 %U https://proceedings.mlr.press/v151/yu22b.html %V 151 %X We consider a graph-structured change point problem in which we observe a random vector with piece-wise constant but otherwise unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector or, equivalently, of estimating the cut separating the sub-graphs over which the mean remains constant. Although graph-valued signals of this type have been previously studied in the literature for the different tasks of testing for the presence of an anomalous cluster and of estimating the mean vector, no localisation results are known outside the classical case of chain graphs. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $\sigma^2$, the size $\Delta$ of the smaller element of the partition, the magnitude $\kappa$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut – a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim \zeta_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $\zeta_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error – measured by the symmetric difference between the true and estimated partition – of order $ \kappa^{-2} \sigma^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes.
APA
Yu, Y., Madrid, O. & Rinaldo, A.. (2022). Optimal partition recovery in general graphs . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:4339-4358 Available from https://proceedings.mlr.press/v151/yu22b.html.

Related Material