Online Distribution Learning with Local Privacy Constraints

Jin Sima; Changlong Wu; Olgica Milenkovic; Wojciech Szpankowski

Online Distribution Learning with Local Privacy Constraints

Jin Sima, Changlong Wu, Olgica Milenkovic, Wojciech Szpankowski

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:460-468, 2024.

Abstract

We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. The problem may be succinctly stated as follows. Let

$\mathcal{F}$ be a distribution-valued function class with an unbounded label set. Our aim is to estimate an \emph{unknown} function

$f\in \mathcal{F}$ in an online fashion. More precisely, at time

$t$ , given a sample

${\mathbf{x}}_t$ , we generate an estimate of

$f({\mathbf{x}}_t)$ using only a \emph{privatized} version of the true \emph{labels} sampled from

$f({\mathbf{x}}_t)$ . The objective is to minimize the cumulative KL-risk of a finite horizon

$T$ . We show that under

$(\epsilon,0)$ -local differential privacy for the labels, the KL-risk equals

$\tilde{\Theta}(\frac{1}{\epsilon}\sqrt{KT}),$ up to poly-logarithmic factors, where

$K=|\mathcal{F}|$ . This result significantly differs from the

$\tilde{\Theta}(\sqrt{T\log K})$ bound derived in Wu et al., (2023a) for \emph{bounded} label sets. As a side-result, our approach recovers a nearly tight upper bound for the hypothesis selection problem of Gopi et al., (2020), which has only been established for the \emph{batch} setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-sima24a,
  title = 	 {Online Distribution Learning with Local Privacy Constraints},
  author =       {Sima, Jin and Wu, Changlong and Milenkovic, Olgica and Szpankowski, Wojciech},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {460--468},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/sima24a/sima24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/sima24a.html},
  abstract = 	 {We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. The problem may be succinctly stated as follows. Let $\mathcal{F}$ be a distribution-valued function class with an unbounded label set. Our aim is to estimate an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion. More precisely, at time $t$, given a sample ${\mathbf{x}}_t$, we generate an estimate of $f({\mathbf{x}}_t)$ using only a \emph{privatized} version of the true \emph{labels} sampled from $f({\mathbf{x}}_t)$. The objective is to minimize the cumulative KL-risk of a finite horizon $T$. We show that under $(\epsilon,0)$-local differential privacy for the labels, the KL-risk equals $\tilde{\Theta}(\frac{1}{\epsilon}\sqrt{KT}),$ up to poly-logarithmic factors, where $K=|\mathcal{F}|$. This result significantly differs from the $\tilde{\Theta}(\sqrt{T\log K})$ bound derived in Wu et al., (2023a) for \emph{bounded} label sets. As a side-result, our approach recovers a nearly tight upper bound for the hypothesis selection problem of Gopi et al., (2020), which has only been established for the \emph{batch} setting.}
}

Endnote

%0 Conference Paper
%T Online Distribution Learning with Local Privacy Constraints
%A Jin Sima
%A Changlong Wu
%A Olgica Milenkovic
%A Wojciech Szpankowski
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-sima24a
%I PMLR
%P 460--468
%U https://proceedings.mlr.press/v238/sima24a.html
%V 238
%X We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. The problem may be succinctly stated as follows. Let $\mathcal{F}$ be a distribution-valued function class with an unbounded label set. Our aim is to estimate an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion. More precisely, at time $t$, given a sample ${\mathbf{x}}_t$, we generate an estimate of $f({\mathbf{x}}_t)$ using only a \emph{privatized} version of the true \emph{labels} sampled from $f({\mathbf{x}}_t)$. The objective is to minimize the cumulative KL-risk of a finite horizon $T$. We show that under $(\epsilon,0)$-local differential privacy for the labels, the KL-risk equals $\tilde{\Theta}(\frac{1}{\epsilon}\sqrt{KT}),$ up to poly-logarithmic factors, where $K=|\mathcal{F}|$. This result significantly differs from the $\tilde{\Theta}(\sqrt{T\log K})$ bound derived in Wu et al., (2023a) for \emph{bounded} label sets. As a side-result, our approach recovers a nearly tight upper bound for the hypothesis selection problem of Gopi et al., (2020), which has only been established for the \emph{batch} setting.

APA

Sima, J., Wu, C., Milenkovic, O. & Szpankowski, W.. (2024). Online Distribution Learning with Local Privacy Constraints. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:460-468 Available from https://proceedings.mlr.press/v238/sima24a.html.

Related Material

Download PDF