The Sparse Vector Technique, Revisited

Haim Kaplan, Yishay Mansour, Uri Stemmer
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2747-2776, 2021.

Abstract

We revisit one of the most basic and widely applicable techniques in the literature of differential privacy – the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting-heavy-hitters problem: On every time step, each of n users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step i, the goal is to identify all data elements x such that many of the users have x as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-kaplan21a, title = {The Sparse Vector Technique, Revisited}, author = {Kaplan, Haim and Mansour, Yishay and Stemmer, Uri}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {2747--2776}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/kaplan21a/kaplan21a.pdf}, url = {https://proceedings.mlr.press/v134/kaplan21a.html}, abstract = {We revisit one of the most basic and widely applicable techniques in the literature of differential privacy – the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting-heavy-hitters problem: On every time step, each of n users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step i, the goal is to identify all data elements x such that many of the users have x as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists.} }
Endnote
%0 Conference Paper %T The Sparse Vector Technique, Revisited %A Haim Kaplan %A Yishay Mansour %A Uri Stemmer %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-kaplan21a %I PMLR %P 2747--2776 %U https://proceedings.mlr.press/v134/kaplan21a.html %V 134 %X We revisit one of the most basic and widely applicable techniques in the literature of differential privacy – the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting-heavy-hitters problem: On every time step, each of n users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step i, the goal is to identify all data elements x such that many of the users have x as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists.
APA
Kaplan, H., Mansour, Y. & Stemmer, U.. (2021). The Sparse Vector Technique, Revisited. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:2747-2776 Available from https://proceedings.mlr.press/v134/kaplan21a.html.

Related Material