Optimal Estimator for Unlabeled Linear Regression

Hang Zhang, Ping Li
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11153-11162, 2020.

Abstract

Unlabeled linear regression, or “linear regression with an unknown permutation”, has attracted increasing attentions due to its applications in (e.g.,) linkage record and de-anonymization. However, the computation of unlabeled linear regression proves to be cumbersome and existing algorithms typically require considerable time, especially in the high dimensional regime. In this paper, we propose a one-step estimator which is optimal from both the computational and the statistical aspects. From the computational perspective, our estimator exhibits the same order of computational complexity as that of the oracle case (which means the regression coefficients are known in advance and only the permutation needs recovery). From the statistical perspective, when comparing with the necessary conditions for permutation recovery, our requirement on the \emph{signal-to-noise ratio} ($\mathsf{SNR}$) agrees up to merely $\Omega\left(\log \log n\right)$ difference when the stable rank of the regression coefficients $\ensuremath{\mathbf{B}}^{\natural}$ is much less than $\log n/\log \log n$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-zhang20n, title = {Optimal Estimator for Unlabeled Linear Regression}, author = {Zhang, Hang and Li, Ping}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {11153--11162}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/zhang20n/zhang20n.pdf}, url = {https://proceedings.mlr.press/v119/zhang20n.html}, abstract = {Unlabeled linear regression, or “linear regression with an unknown permutation”, has attracted increasing attentions due to its applications in (e.g.,) linkage record and de-anonymization. However, the computation of unlabeled linear regression proves to be cumbersome and existing algorithms typically require considerable time, especially in the high dimensional regime. In this paper, we propose a one-step estimator which is optimal from both the computational and the statistical aspects. From the computational perspective, our estimator exhibits the same order of computational complexity as that of the oracle case (which means the regression coefficients are known in advance and only the permutation needs recovery). From the statistical perspective, when comparing with the necessary conditions for permutation recovery, our requirement on the \emph{signal-to-noise ratio} ($\mathsf{SNR}$) agrees up to merely $\Omega\left(\log \log n\right)$ difference when the stable rank of the regression coefficients $\ensuremath{\mathbf{B}}^{\natural}$ is much less than $\log n/\log \log n$. } }
Endnote
%0 Conference Paper %T Optimal Estimator for Unlabeled Linear Regression %A Hang Zhang %A Ping Li %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-zhang20n %I PMLR %P 11153--11162 %U https://proceedings.mlr.press/v119/zhang20n.html %V 119 %X Unlabeled linear regression, or “linear regression with an unknown permutation”, has attracted increasing attentions due to its applications in (e.g.,) linkage record and de-anonymization. However, the computation of unlabeled linear regression proves to be cumbersome and existing algorithms typically require considerable time, especially in the high dimensional regime. In this paper, we propose a one-step estimator which is optimal from both the computational and the statistical aspects. From the computational perspective, our estimator exhibits the same order of computational complexity as that of the oracle case (which means the regression coefficients are known in advance and only the permutation needs recovery). From the statistical perspective, when comparing with the necessary conditions for permutation recovery, our requirement on the \emph{signal-to-noise ratio} ($\mathsf{SNR}$) agrees up to merely $\Omega\left(\log \log n\right)$ difference when the stable rank of the regression coefficients $\ensuremath{\mathbf{B}}^{\natural}$ is much less than $\log n/\log \log n$.
APA
Zhang, H. & Li, P.. (2020). Optimal Estimator for Unlabeled Linear Regression. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11153-11162 Available from https://proceedings.mlr.press/v119/zhang20n.html.

Related Material