[edit]

# Limits of Approximating the Median Treatment Effect

*Proceedings of Thirty Seventh Conference on Learning Theory*, PMLR 247:1-21, 2024.

#### Abstract

Average Treatment Effect (ATE) estimation is a well-studied problem in causal inference. However, it does not necessarily capture the heterogeneity in the data, and several approaches have been proposed to tackle the issue, including estimating the Quantile Treatment Effects. In the finite population setting containing $n$ individuals, with treatment and control values denoted by the potential outcome vectors $\mathbf{a}, \mathbf{b}$, much of the prior work focused on estimating median$(\mathbf{a}) -$ median$(\mathbf{b})$, as it is easier to estimate than the desired estimand of median$(\mathbf{a-b})$, called the Median Treatment Effect (MTE). In this work, we argue that MTE is not estimable and detail a novel notion of approximation that relies on the sorted order of the values in $\mathbf{a-b}$: we approximate the median by a value whose quantiles in $\mathbf{a-b}$ are close to $0.5$ (median). Next, we identify a quantity called \emph{variability} that exactly captures the complexity of MTE estimation. Using this, we establish that when potential outcomes take values in the set $\{0,1,\ldots,k-1\}$ the worst-case (over inputs $\mathbf{a,b}$) optimal (over algorithms) approximation factor of the MTE is $\frac{1}{2}\cdot \frac{2k-3}{2k-1}$. Further, by drawing connections to the notions of instance-optimality studied in theoretical computer science, we show that \emph{every} algorithm for estimating the MTE obtains an approximation error that is no better than the error of an algorithm that computes variability, on roughly a per input basis: hence, variability leads to an almost instance optimal approximation algorithm for estimating the MTE. Finally, we provide a simple linear time algorithm for computing the variability exactly. Unlike much prior works, a particular highlight of our work is that we make no assumptions about how the potential outcome vectors are generated or how they are correlated, except that the potential outcome values are $k$-ary, i.e., take one of $k$ discrete values $\{0,1,\ldots,k-1\}$.