[edit]
Learning Invariant Representations with Missing Data
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:290-301, 2022.
Abstract
Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.