A General Characterization of the Statistical Query Complexity

Vitaly Feldman
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:785-830, 2017.

Abstract

Statistical query (SQ) algorithms are algorithms that have access to an \em SQ oracle for the input distribution $D$ instead of i.i.d. samples from $D$. Given a query function $φ:X \to [-1,1]$, the oracle returns an estimate of $\bf E_x∼D[φ(x)]$ within some tolerance $\tau_φ$ that roughly corresponds to the number of samples. In this work we demonstrate that the complexity of solving an arbitrary statistical problem using SQ algorithms can be captured by a relatively simple notion of statistical dimension that we introduce. SQ algorithms capture a broad spectrum of algorithmic approaches used in theory and practice, most notably, convex optimization techniques. Hence our statistical dimension allows to investigate the power of a variety of algorithmic approaches by analyzing a single linear-algebraic parameter. Such characterizations were investigated over the past 20 years in learning theory but prior characterizations are restricted to the much simpler setting of classification problems relative to a fixed distribution on the domain. Our characterization is also the first to precisely characterize the necessary tolerance of queries. We give applications of our techniques to two open problems in learning theory and to algorithms that are subject to memory and communication constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-feldman17c, title = {A General Characterization of the Statistical Query Complexity}, author = {Feldman, Vitaly}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {785--830}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/feldman17c/feldman17c.pdf}, url = {https://proceedings.mlr.press/v65/feldman17c.html}, abstract = {Statistical query (SQ) algorithms are algorithms that have access to an \em SQ oracle for the input distribution $D$ instead of i.i.d. samples from $D$. Given a query function $φ:X \to [-1,1]$, the oracle returns an estimate of $\bf E_x∼D[φ(x)]$ within some tolerance $\tau_φ$ that roughly corresponds to the number of samples. In this work we demonstrate that the complexity of solving an arbitrary statistical problem using SQ algorithms can be captured by a relatively simple notion of statistical dimension that we introduce. SQ algorithms capture a broad spectrum of algorithmic approaches used in theory and practice, most notably, convex optimization techniques. Hence our statistical dimension allows to investigate the power of a variety of algorithmic approaches by analyzing a single linear-algebraic parameter. Such characterizations were investigated over the past 20 years in learning theory but prior characterizations are restricted to the much simpler setting of classification problems relative to a fixed distribution on the domain. Our characterization is also the first to precisely characterize the necessary tolerance of queries. We give applications of our techniques to two open problems in learning theory and to algorithms that are subject to memory and communication constraints.} }
Endnote
%0 Conference Paper %T A General Characterization of the Statistical Query Complexity %A Vitaly Feldman %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-feldman17c %I PMLR %P 785--830 %U https://proceedings.mlr.press/v65/feldman17c.html %V 65 %X Statistical query (SQ) algorithms are algorithms that have access to an \em SQ oracle for the input distribution $D$ instead of i.i.d. samples from $D$. Given a query function $φ:X \to [-1,1]$, the oracle returns an estimate of $\bf E_x∼D[φ(x)]$ within some tolerance $\tau_φ$ that roughly corresponds to the number of samples. In this work we demonstrate that the complexity of solving an arbitrary statistical problem using SQ algorithms can be captured by a relatively simple notion of statistical dimension that we introduce. SQ algorithms capture a broad spectrum of algorithmic approaches used in theory and practice, most notably, convex optimization techniques. Hence our statistical dimension allows to investigate the power of a variety of algorithmic approaches by analyzing a single linear-algebraic parameter. Such characterizations were investigated over the past 20 years in learning theory but prior characterizations are restricted to the much simpler setting of classification problems relative to a fixed distribution on the domain. Our characterization is also the first to precisely characterize the necessary tolerance of queries. We give applications of our techniques to two open problems in learning theory and to algorithms that are subject to memory and communication constraints.
APA
Feldman, V.. (2017). A General Characterization of the Statistical Query Complexity. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:785-830 Available from https://proceedings.mlr.press/v65/feldman17c.html.

Related Material