Learning Invariant Representations with Missing Data

Mark Goldstein, Joern-Henrik Jacobsen, Olina Chau, Adriel Saporta, Aahlad Manas Puli, Rajesh Ranganath, Andrew Miller
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:290-301, 2022.

Abstract

Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-goldstein22a, title = {Learning Invariant Representations with Missing Data}, author = {Goldstein, Mark and Jacobsen, Joern-Henrik and Chau, Olina and Saporta, Adriel and Puli, Aahlad Manas and Ranganath, Rajesh and Miller, Andrew}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {290--301}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/goldstein22a/goldstein22a.pdf}, url = {https://proceedings.mlr.press/v177/goldstein22a.html}, abstract = {Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.} }
Endnote
%0 Conference Paper %T Learning Invariant Representations with Missing Data %A Mark Goldstein %A Joern-Henrik Jacobsen %A Olina Chau %A Adriel Saporta %A Aahlad Manas Puli %A Rajesh Ranganath %A Andrew Miller %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-goldstein22a %I PMLR %P 290--301 %U https://proceedings.mlr.press/v177/goldstein22a.html %V 177 %X Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.
APA
Goldstein, M., Jacobsen, J., Chau, O., Saporta, A., Puli, A.M., Ranganath, R. & Miller, A.. (2022). Learning Invariant Representations with Missing Data. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:290-301 Available from https://proceedings.mlr.press/v177/goldstein22a.html.

Related Material