A heuristic for statistical seriation

Komal Dhull, Jingyan Wang, Nihar B. Shah, Yuanzhi Li, R. Ravi
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:621-631, 2021.

Abstract

We study the statistical seriation problem, where the goal is to estimate a matrix whose rows satisfy the same shape constraint after a permutation of the columns. This is a important classical problem, with close connections to statistical literature in permutation-based models and also has wide applications ranging from archaeology to biology. Specifically, we consider the case where the rows are monotonically increasing after an unknown permutation of the columns. Past work has shown that the least-squares estimator is optimal up to logarithmic factors, but efficient algorithms for computing the least-squares estimator remain unknown to date. We approach this important problem from a heuristic perspective. Specifically, we replace the combinatorial permutation constraint by a continuous regularization term, and then use projected gradient descent to obtain a local minimum of the non-convex objective. We show that the attained local minimum is the global minimum in certain special cases under the noiseless setting, and preserves desirable properties under the noisy setting. Simulation results reveal that our proposed algorithm outperforms prior algorithms when (1) the underlying model is more complex than simplistic parametric assumptions such as low-rankedness, or (2) the signal-to-noise ratio is high. Under partial observations, the proposed algorithm requires an initialization, and different initializations may lead to different local minima. We empirically observe that the proposed algorithm yields consistent improvement over the initialization, even though different initializations start with different levels of quality.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-dhull21a, title = {A heuristic for statistical seriation}, author = {Dhull, Komal and Wang, Jingyan and Shah, Nihar B. and Li, Yuanzhi and Ravi, R.}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {621--631}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/dhull21a/dhull21a.pdf}, url = {https://proceedings.mlr.press/v161/dhull21a.html}, abstract = {We study the statistical seriation problem, where the goal is to estimate a matrix whose rows satisfy the same shape constraint after a permutation of the columns. This is a important classical problem, with close connections to statistical literature in permutation-based models and also has wide applications ranging from archaeology to biology. Specifically, we consider the case where the rows are monotonically increasing after an unknown permutation of the columns. Past work has shown that the least-squares estimator is optimal up to logarithmic factors, but efficient algorithms for computing the least-squares estimator remain unknown to date. We approach this important problem from a heuristic perspective. Specifically, we replace the combinatorial permutation constraint by a continuous regularization term, and then use projected gradient descent to obtain a local minimum of the non-convex objective. We show that the attained local minimum is the global minimum in certain special cases under the noiseless setting, and preserves desirable properties under the noisy setting. Simulation results reveal that our proposed algorithm outperforms prior algorithms when (1) the underlying model is more complex than simplistic parametric assumptions such as low-rankedness, or (2) the signal-to-noise ratio is high. Under partial observations, the proposed algorithm requires an initialization, and different initializations may lead to different local minima. We empirically observe that the proposed algorithm yields consistent improvement over the initialization, even though different initializations start with different levels of quality.} }
Endnote
%0 Conference Paper %T A heuristic for statistical seriation %A Komal Dhull %A Jingyan Wang %A Nihar B. Shah %A Yuanzhi Li %A R. Ravi %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-dhull21a %I PMLR %P 621--631 %U https://proceedings.mlr.press/v161/dhull21a.html %V 161 %X We study the statistical seriation problem, where the goal is to estimate a matrix whose rows satisfy the same shape constraint after a permutation of the columns. This is a important classical problem, with close connections to statistical literature in permutation-based models and also has wide applications ranging from archaeology to biology. Specifically, we consider the case where the rows are monotonically increasing after an unknown permutation of the columns. Past work has shown that the least-squares estimator is optimal up to logarithmic factors, but efficient algorithms for computing the least-squares estimator remain unknown to date. We approach this important problem from a heuristic perspective. Specifically, we replace the combinatorial permutation constraint by a continuous regularization term, and then use projected gradient descent to obtain a local minimum of the non-convex objective. We show that the attained local minimum is the global minimum in certain special cases under the noiseless setting, and preserves desirable properties under the noisy setting. Simulation results reveal that our proposed algorithm outperforms prior algorithms when (1) the underlying model is more complex than simplistic parametric assumptions such as low-rankedness, or (2) the signal-to-noise ratio is high. Under partial observations, the proposed algorithm requires an initialization, and different initializations may lead to different local minima. We empirically observe that the proposed algorithm yields consistent improvement over the initialization, even though different initializations start with different levels of quality.
APA
Dhull, K., Wang, J., Shah, N.B., Li, Y. & Ravi, R.. (2021). A heuristic for statistical seriation. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:621-631 Available from https://proceedings.mlr.press/v161/dhull21a.html.

Related Material