Sparse and Low-bias Estimation of High Dimensional Vector Autoregressive Models
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:55-64, 2020.
Vector autoregressive (VAR) models are widely used for causal discovery and forecasting in multivariate time series analysis. In the high-dimensional setting, which is increasingly common in fields such as neuroscience and econometrics, model parameters are inferred by $L_1$-regularized maximum likelihood (RML). A well-known feature of RML inference is that in general the technique produces a trade-off between sparsity and bias that depends on the choice of the regularization hyperparameter. In the context of multivariate time series analysis, sparse estimates are favorable for causal discovery and low-bias estimates are favorable for forecasting. However, owing to a paucity of research on hyperparameter selection methods, practitioners must rely on ad-hoc methods such as cross-validation (or manual tuning). The particular balance that such approaches achieve between the two goals — causal discovery and forecasting — is poorly understood. Our paper investigates this behavior and proposes a method (UoI-VAR) that achieves a better balance between sparsity and bias when the underlying causal influences are in fact sparse. We demonstrate through simulation that RML with a hyperparameter selected by cross-validation tends to overfit, producing relatively dense estimates. We further demonstrate that UoI-VAR much more effectively approximates the correct sparsity pattern with only a minor compromise in model fit, particularly so for larger data dimensions, and that the estimates produced by UoI-VAR exhibit less bias. We conclude that our method achieves improved performance especially well-suited to applications involving simultaneous causal discovery and forecasting in high-dimensional settings.