The Sample Complexity of Simple Binary Hypothesis Testing

Ankit Pensia; Varun Jog; Po-Ling Loh

The Sample Complexity of Simple Binary Hypothesis Testing

Ankit Pensia, Varun Jog, Po-Ling Loh

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4205-4206, 2024.

Abstract

The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions

$p$ and

$q$ in either: (i) the prior-free setting, with type-I error at most

$\alpha$ and type-II error at most

$\beta$ ; or (ii) the Bayesian setting, with Bayes error at most

$\delta$ and prior distribution

$(\alpha, 1-\alpha)$ . This problem has only been studied when

$\alpha = \beta$ (prior-free) or

$\alpha = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between

$p$ and

$q$ , up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of

$p$ ,

$q$ , and all error parameters) for: (i) all

$0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all

$\delta \le \alpha/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen–Shannon and Hellinger families. The main technical result concerns an

$f$ -divergence inequality between members of the Jensen–Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to robust and distributed (locally-private and communication-constrained) hypothesis testing.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-pensia24a,
  title = 	 {The Sample Complexity of Simple Binary Hypothesis Testing},
  author =       {Pensia, Ankit and Jog, Varun and Loh, Po-Ling},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {4205--4206},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/pensia24a/pensia24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/pensia24a.html},
  abstract = 	 {The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and type-II error at most $\beta$; or (ii) the Bayesian setting, with Bayes error at most $\delta$ and prior distribution $(\alpha, 1-\alpha)$. This problem has only been studied when $\alpha = \beta$ (prior-free) or $\alpha = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between $p$ and $q$, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all $\delta \le \alpha/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen–Shannon and Hellinger families. The main technical result concerns an $f$-divergence inequality between members of the Jensen–Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to robust and distributed (locally-private and communication-constrained) hypothesis testing.}
}

Endnote

%0 Conference Paper
%T The Sample Complexity of Simple Binary Hypothesis Testing
%A Ankit Pensia
%A Varun Jog
%A Po-Ling Loh
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-pensia24a
%I PMLR
%P 4205--4206
%U https://proceedings.mlr.press/v247/pensia24a.html
%V 247
%X The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and type-II error at most $\beta$; or (ii) the Bayesian setting, with Bayes error at most $\delta$ and prior distribution $(\alpha, 1-\alpha)$. This problem has only been studied when $\alpha = \beta$ (prior-free) or $\alpha = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between $p$ and $q$, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all $\delta \le \alpha/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen–Shannon and Hellinger families. The main technical result concerns an $f$-divergence inequality between members of the Jensen–Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to robust and distributed (locally-private and communication-constrained) hypothesis testing.

APA


Pensia, A., Jog, V. & Loh, P.. (2024). The Sample Complexity of Simple Binary Hypothesis Testing. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4205-4206 Available from https://proceedings.mlr.press/v247/pensia24a.html.

Related Material

Download PDF