Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Max Simchowitz; Abhishek Gupta; Kaiqing Zhang

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Max Simchowitz, Abhishek Gupta, Kaiqing Zhang

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3356-3468, 2023.

Abstract

Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call \emph{combinatorial distribution shift}, where (a) under the test- and training-distributions, the labels

$z$ are determined by pairs of features

$(x,y)$ , (b) the training distribution has coverage of certain \emph{marginal} distributions over

$x$ and

$y$ separately, but (c) the test distribution involves examples from a product distribution over

$(x,y)$ that is \emph{not} covered by the training distribution. Focusing on the special case where the labels are given by \emph{bilinear embeddings} into a Hilbert space

$\mathcal H$ :

$\mathbb{E}[z \mid x,y ]=⟨f_{\star}(x),g_{\star}(y)\rangle_{\mathcal{H}}$ , we aim to extrapolate to a test distribution domain that is {not} covered in training, or \emph{bilinear combinatorial extrapolation}. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either \emph{exactly low-rank}, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under \emph{gradual} spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-

$k$ singular value decomposition approximations between two matrices that depends on the \emph{relative} spectral gap rather than the \emph{absolute} spectral gap, a result we think may be of broader independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-simchowitz23a,
  title = 	 {Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective},
  author =       {Simchowitz, Max and Gupta, Abhishek and Zhang, Kaiqing},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {3356--3468},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/simchowitz23a/simchowitz23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/simchowitz23a.html},
  abstract = 	 {Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call    \emph{combinatorial distribution shift}, where (a) under the test- and training-distributions,  the labels $z$ are determined by pairs of features $(x,y)$, (b) the training  distribution has coverage of certain \emph{marginal} distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is \emph{not} covered by the training distribution. Focusing on the special case where the labels are given by \emph{bilinear embeddings}  into a Hilbert space $\mathcal H$: $\mathbb{E}[z \mid x,y ]=⟨f_{\star}(x),g_{\star}(y)\rangle_{\mathcal{H}}$, we aim to extrapolate to a test distribution domain that is {not} covered in training, or \emph{bilinear combinatorial extrapolation}. Our setting generalizes a special case of matrix completion from missing-not-at-random data,  for which all existing results require the ground-truth matrices to be either \emph{exactly low-rank}, or to exhibit very sharp spectral cutoffs.  In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under \emph{gradual} spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the \emph{relative} spectral gap rather than the \emph{absolute} spectral gap, a result we think may be of broader independent interest.}
}

Endnote

%0 Conference Paper
%T Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
%A Max Simchowitz
%A Abhishek Gupta
%A Kaiqing Zhang
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-simchowitz23a
%I PMLR
%P 3356--3468
%U https://proceedings.mlr.press/v195/simchowitz23a.html
%V 195
%X Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call    \emph{combinatorial distribution shift}, where (a) under the test- and training-distributions,  the labels $z$ are determined by pairs of features $(x,y)$, (b) the training  distribution has coverage of certain \emph{marginal} distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is \emph{not} covered by the training distribution. Focusing on the special case where the labels are given by \emph{bilinear embeddings}  into a Hilbert space $\mathcal H$: $\mathbb{E}[z \mid x,y ]=⟨f_{\star}(x),g_{\star}(y)\rangle_{\mathcal{H}}$, we aim to extrapolate to a test distribution domain that is {not} covered in training, or \emph{bilinear combinatorial extrapolation}. Our setting generalizes a special case of matrix completion from missing-not-at-random data,  for which all existing results require the ground-truth matrices to be either \emph{exactly low-rank}, or to exhibit very sharp spectral cutoffs.  In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under \emph{gradual} spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the \emph{relative} spectral gap rather than the \emph{absolute} spectral gap, a result we think may be of broader independent interest.

APA


Simchowitz, M., Gupta, A. & Zhang, K.. (2023). Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3356-3468 Available from https://proceedings.mlr.press/v195/simchowitz23a.html.

Related Material

Download PDF