[edit]
Matching Map Recovery with an Unknown Number of Outliers
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:891-906, 2023.
Abstract
We consider the problem of finding the matching map between two sets of d-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If n and m are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality k∗≤min. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than 5(d\log(4nm/\alpha))^{1/4}, then the true matching map can be recovered with probability 1-\alpha. Interestingly, this threshold does not depend on k^* and is the same as the one obtained in prior work in the case of k = \min(n,m). The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings \{\hat\pi_k:k\in[\min(n,m)]\}. Each \hat\pi_k minimizes the sum of squares of distances between two sets of size k. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.