Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series

Min Zheng, Samantha Kleinberg
Proceedings of the 4th Machine Learning for Healthcare Conference, PMLR 106:474-489, 2019.

Abstract

Increasingly large observational datasets from healthcare and social media may allow new types of causal inference. However, these data are often missing key variables, increasing the chance of finding spurious causal relationships due to confounding. While methods exist for causal inference with latent variables in static cases, temporal relationships are more challenging, as varying time lags make latent causes more difficult to uncover and approaches often have significantly higher computational complexity. To address this, we make the key observation that while a variable may be latent in one dataset, it may be observed in another, or we may have domain knowledge about its effects. We propose a computationally efficient method that overcomes latent variables by using prior knowledge to reconstruct data for unobserved variables, while remaining robust to cases when the knowledge is wrong or does not apply. On simulated data, our approach outperforms the state of the art with a lower false discovery rate for causal inference. On real-world data from individuals with Type 1 diabetes, we show that our approach can discover causal relationships involving unmeasured meals and exercise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v106-zheng19a, title = {Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series}, author = {Zheng, Min and Kleinberg, Samantha}, booktitle = {Proceedings of the 4th Machine Learning for Healthcare Conference}, pages = {474--489}, year = {2019}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {106}, series = {Proceedings of Machine Learning Research}, month = {09--10 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v106/zheng19a/zheng19a.pdf}, url = {https://proceedings.mlr.press/v106/zheng19a.html}, abstract = {Increasingly large observational datasets from healthcare and social media may allow new types of causal inference. However, these data are often missing key variables, increasing the chance of finding spurious causal relationships due to confounding. While methods exist for causal inference with latent variables in static cases, temporal relationships are more challenging, as varying time lags make latent causes more difficult to uncover and approaches often have significantly higher computational complexity. To address this, we make the key observation that while a variable may be latent in one dataset, it may be observed in another, or we may have domain knowledge about its effects. We propose a computationally efficient method that overcomes latent variables by using prior knowledge to reconstruct data for unobserved variables, while remaining robust to cases when the knowledge is wrong or does not apply. On simulated data, our approach outperforms the state of the art with a lower false discovery rate for causal inference. On real-world data from individuals with Type 1 diabetes, we show that our approach can discover causal relationships involving unmeasured meals and exercise.} }
Endnote
%0 Conference Paper %T Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series %A Min Zheng %A Samantha Kleinberg %B Proceedings of the 4th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2019 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v106-zheng19a %I PMLR %P 474--489 %U https://proceedings.mlr.press/v106/zheng19a.html %V 106 %X Increasingly large observational datasets from healthcare and social media may allow new types of causal inference. However, these data are often missing key variables, increasing the chance of finding spurious causal relationships due to confounding. While methods exist for causal inference with latent variables in static cases, temporal relationships are more challenging, as varying time lags make latent causes more difficult to uncover and approaches often have significantly higher computational complexity. To address this, we make the key observation that while a variable may be latent in one dataset, it may be observed in another, or we may have domain knowledge about its effects. We propose a computationally efficient method that overcomes latent variables by using prior knowledge to reconstruct data for unobserved variables, while remaining robust to cases when the knowledge is wrong or does not apply. On simulated data, our approach outperforms the state of the art with a lower false discovery rate for causal inference. On real-world data from individuals with Type 1 diabetes, we show that our approach can discover causal relationships involving unmeasured meals and exercise.
APA
Zheng, M. & Kleinberg, S.. (2019). Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series. Proceedings of the 4th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 106:474-489 Available from https://proceedings.mlr.press/v106/zheng19a.html.

Related Material