[edit]
Computing Simple Bounds for Regression Estimates for Linear Regression with Interval-valued Covariates
Proceedings of the Twelveth International Symposium on Imprecise Probability: Theories and Applications, PMLR 147:273-279, 2021.
Abstract
In this paper, we deal with linear regression where the covariates are interval-valued and the dependent variable is precise. Opposed to the case where the dependent variable is interval-valued and the covariates are precise, it is far more difficult to compute the set of all ordinary least squares (OLS) estimates as the precise values of the covariates vary over all possible values, compatible with the given intervals of the covariates. Though the exact solution is difficult to obtain, there are still some simple possibilities to compute bounds for the regression parameters. In this paper we deal with simple linear regression and present three different approaches: The first one uses a simple interval-arithmetic consideration for the equation for the slope parameter. The second approach uses reverse regression to swap the roles of the dependent and the independent variable to make the computation analytically solvable. The obtained solution for the reverse regression then gives an analytical upper bound for the slope parameter of the original regression. The third approach does not directly give bounds for the OLS estimator. Instead, before the actual interval analysis, in a first step, we modify the OLS estimator to another linear estimator which is simply a reasonably weighted convex combination of a number of unbiased estimators, which are themselves based on only two data points of the data set, respectively. It turns out that for the degenerate case of a precise independent variable, this estimator coincides with the OLS estimator. Additionally, the third method does also work if both the independent variable, as well as the dependent variable are interval-valued. Also the case of more than one covariate is manageable. A further nice point is that because of the analytical accessibility of the third estimator, also confidence intervals for the bounds can be established. To compare all three approaches, we conduct a short simulation study.