[edit]

# Greed is good: correspondence recovery for unlabeled linear regression

*Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence*, PMLR 216:2509-2518, 2023.

#### Abstract

We consider the unlabeled linear regression reading as $\mathbf{Y} = \mathbf{\Pi}^{*}\mathbf{X}\mathbf{B}^* + \mathbf{W}$, where $\mathbf{\Pi}^{*}, \mathbf{B}^*$ and $\mathbf{W}$ represents missing (or incomplete) correspondence information, signals, and additive noise, respectively. Our goal is to perform data alignment between $\mathbf{Y}$ and $\mathbf{X}$, or equivalently, reconstruct the correspondence information encoded by $\mathbf{\Pi}^*$. Based on whether signal $\mathbf{B}^*$ is given a prior, we separately propose two greedy-selection-based estimators, which both reach the mini-max optimality. Compared with previous works, our work $(i)$ supports partial recovery of the correspondence information; and $(ii)$ applies to a general matrix family rather than the permutation matrices, to put more specifically, selection matrices, where multiple rows of $\mathbf{X}$ can correspond to the same row in $\mathbf{Y}$. Moreover, numerical experiments are provided to corroborate our claims.