Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects

Yao Zhang, Jeroen Berrevoets, Mihaela Van Der Schaar
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:4158-4177, 2022.

Abstract

Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-zhang22b, title = { Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects }, author = {Zhang, Yao and Berrevoets, Jeroen and Van Der Schaar, Mihaela}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {4158--4177}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/zhang22b/zhang22b.pdf}, url = {https://proceedings.mlr.press/v151/zhang22b.html}, abstract = { Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods. } }
Endnote
%0 Conference Paper %T Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects %A Yao Zhang %A Jeroen Berrevoets %A Mihaela Van Der Schaar %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-zhang22b %I PMLR %P 4158--4177 %U https://proceedings.mlr.press/v151/zhang22b.html %V 151 %X Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods.
APA
Zhang, Y., Berrevoets, J. & Van Der Schaar, M.. (2022). Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:4158-4177 Available from https://proceedings.mlr.press/v151/zhang22b.html.

Related Material