Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

I. Diakonikolas, D. M. Kane, A. Stewart
29th Annual Conference on Learning Theory, PMLR 49:831-849, 2016.

Abstract

We study the structure and learnability of sums of independent integer random variables (SIIRVs). For k ∈\mathbbZ_+, a \emk-SIIRV of order n ∈\mathbbZ_+ is the probability distribution of the sum of n mutually independent random variables each supported on {0, 1, …, k-1}. We denote by \cal S_n,k the set of all k-SIIRVs of order n. How many samples are required to learn an arbitrary distribution in \cal S_n,k? In this paper, we tightly characterize the sample and computational complexity of this problem. More precisely, we design a computationally efficient algorithm that uses \widetildeO(k/ε^2) samples, and learns an arbitrary k-SIIRV within error ε, in total variation distance. Moreover, we show that the \em optimal sample complexity of this learning problem is Θ((k/ε^2)\sqrt\log(1/ε)), i.e., we prove an upper bound and a matching information-theoretic lower bound. Our algorithm proceeds by learning the Fourier transform of the target k-SIIRV in its effective support. Its correctness relies on the \em approximate sparsity of the Fourier transform of k-SIIRVs – a structural property that we establish, roughly stating that the Fourier transform of k-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about k-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse \em proper ε-cover for \cal S_n,k, in total variation distance. We also obtain a novel geometric characterization of the space of k-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of ε-covers for \cal S_n,k – establishing that our cover upper bound is optimal – and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications. In a subsequent work, we use a generalization of this idea to obtain the first computationally efficient learning algorithm for Poisson multinomial distributions. In a separate work, we build on our Fourier-based approach to obtain the fastest known proper learning algorithm for Poisson binomial distributions (2-SIIRVs).

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-diakonikolas16a, title = {Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables}, author = {Diakonikolas, I. and Kane, D. M. and Stewart, A.}, booktitle = {29th Annual Conference on Learning Theory}, pages = {831--849}, year = {2016}, editor = {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/diakonikolas16a.pdf}, url = {https://proceedings.mlr.press/v49/diakonikolas16a.html}, abstract = {We study the structure and learnability of sums of independent integer random variables (SIIRVs). For k ∈\mathbbZ_+, a \emk-SIIRV of order n ∈\mathbbZ_+ is the probability distribution of the sum of n mutually independent random variables each supported on {0, 1, …, k-1}. We denote by \cal S_n,k the set of all k-SIIRVs of order n. How many samples are required to learn an arbitrary distribution in \cal S_n,k? In this paper, we tightly characterize the sample and computational complexity of this problem. More precisely, we design a computationally efficient algorithm that uses \widetildeO(k/ε^2) samples, and learns an arbitrary k-SIIRV within error ε, in total variation distance. Moreover, we show that the \em optimal sample complexity of this learning problem is Θ((k/ε^2)\sqrt\log(1/ε)), i.e., we prove an upper bound and a matching information-theoretic lower bound. Our algorithm proceeds by learning the Fourier transform of the target k-SIIRV in its effective support. Its correctness relies on the \em approximate sparsity of the Fourier transform of k-SIIRVs – a structural property that we establish, roughly stating that the Fourier transform of k-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about k-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse \em proper ε-cover for \cal S_n,k, in total variation distance. We also obtain a novel geometric characterization of the space of k-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of ε-covers for \cal S_n,k – establishing that our cover upper bound is optimal – and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications. In a subsequent work, we use a generalization of this idea to obtain the first computationally efficient learning algorithm for Poisson multinomial distributions. In a separate work, we build on our Fourier-based approach to obtain the fastest known proper learning algorithm for Poisson binomial distributions (2-SIIRVs). } }
Endnote
%0 Conference Paper %T Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables %A I. Diakonikolas %A D. M. Kane %A A. Stewart %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-diakonikolas16a %I PMLR %P 831--849 %U https://proceedings.mlr.press/v49/diakonikolas16a.html %V 49 %X We study the structure and learnability of sums of independent integer random variables (SIIRVs). For k ∈\mathbbZ_+, a \emk-SIIRV of order n ∈\mathbbZ_+ is the probability distribution of the sum of n mutually independent random variables each supported on {0, 1, …, k-1}. We denote by \cal S_n,k the set of all k-SIIRVs of order n. How many samples are required to learn an arbitrary distribution in \cal S_n,k? In this paper, we tightly characterize the sample and computational complexity of this problem. More precisely, we design a computationally efficient algorithm that uses \widetildeO(k/ε^2) samples, and learns an arbitrary k-SIIRV within error ε, in total variation distance. Moreover, we show that the \em optimal sample complexity of this learning problem is Θ((k/ε^2)\sqrt\log(1/ε)), i.e., we prove an upper bound and a matching information-theoretic lower bound. Our algorithm proceeds by learning the Fourier transform of the target k-SIIRV in its effective support. Its correctness relies on the \em approximate sparsity of the Fourier transform of k-SIIRVs – a structural property that we establish, roughly stating that the Fourier transform of k-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about k-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse \em proper ε-cover for \cal S_n,k, in total variation distance. We also obtain a novel geometric characterization of the space of k-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of ε-covers for \cal S_n,k – establishing that our cover upper bound is optimal – and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications. In a subsequent work, we use a generalization of this idea to obtain the first computationally efficient learning algorithm for Poisson multinomial distributions. In a separate work, we build on our Fourier-based approach to obtain the fastest known proper learning algorithm for Poisson binomial distributions (2-SIIRVs).
RIS
TY - CPAPER TI - Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables AU - I. Diakonikolas AU - D. M. Kane AU - A. Stewart BT - 29th Annual Conference on Learning Theory DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-diakonikolas16a PB - PMLR DP - Proceedings of Machine Learning Research VL - 49 SP - 831 EP - 849 L1 - http://proceedings.mlr.press/v49/diakonikolas16a.pdf UR - https://proceedings.mlr.press/v49/diakonikolas16a.html AB - We study the structure and learnability of sums of independent integer random variables (SIIRVs). For k ∈\mathbbZ_+, a \emk-SIIRV of order n ∈\mathbbZ_+ is the probability distribution of the sum of n mutually independent random variables each supported on {0, 1, …, k-1}. We denote by \cal S_n,k the set of all k-SIIRVs of order n. How many samples are required to learn an arbitrary distribution in \cal S_n,k? In this paper, we tightly characterize the sample and computational complexity of this problem. More precisely, we design a computationally efficient algorithm that uses \widetildeO(k/ε^2) samples, and learns an arbitrary k-SIIRV within error ε, in total variation distance. Moreover, we show that the \em optimal sample complexity of this learning problem is Θ((k/ε^2)\sqrt\log(1/ε)), i.e., we prove an upper bound and a matching information-theoretic lower bound. Our algorithm proceeds by learning the Fourier transform of the target k-SIIRV in its effective support. Its correctness relies on the \em approximate sparsity of the Fourier transform of k-SIIRVs – a structural property that we establish, roughly stating that the Fourier transform of k-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about k-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse \em proper ε-cover for \cal S_n,k, in total variation distance. We also obtain a novel geometric characterization of the space of k-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of ε-covers for \cal S_n,k – establishing that our cover upper bound is optimal – and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications. In a subsequent work, we use a generalization of this idea to obtain the first computationally efficient learning algorithm for Poisson multinomial distributions. In a separate work, we build on our Fourier-based approach to obtain the fastest known proper learning algorithm for Poisson binomial distributions (2-SIIRVs). ER -
APA
Diakonikolas, I., Kane, D.M. & Stewart, A.. (2016). Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:831-849 Available from https://proceedings.mlr.press/v49/diakonikolas16a.html.

Related Material