Variance reduction in frequency estimators via control variates method

Rameshwar Pratap; Raghav Kulkarni

Variance reduction in frequency estimators via control variates method

Rameshwar Pratap, Raghav Kulkarni

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:183-193, 2021.

Abstract

Generating succinct summaries (also known as sketches) of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch (Cormode and Muthukrishnan, J. Algorithms, 55(1):58–75, 2005) and Count-Sketch (Charikar et al., Theor. Comput. Sci., 312(1):3–15, 2004) are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick (Lavenberg and Welch, Manage. Sci., 27:322–335, 1981) used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v161-pratap21a,
  title = 	 {Variance reduction in frequency estimators via control variates method},
  author =       {Pratap, Rameshwar and Kulkarni, Raghav},
  booktitle = 	 {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {183--193},
  year = 	 {2021},
  editor = 	 {de Campos, Cassio and Maathuis, Marloes H.},
  volume = 	 {161},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v161/pratap21a/pratap21a.pdf},
  url = 	 {https://proceedings.mlr.press/v161/pratap21a.html},
  abstract = 	 {Generating succinct summaries (also known as sketches) of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch (Cormode and Muthukrishnan, J. Algorithms, 55(1):58–75, 2005) and Count-Sketch (Charikar et al., Theor. Comput. Sci., 312(1):3–15, 2004) are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick (Lavenberg and Welch, Manage. Sci., 27:322–335, 1981) used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.}
}

Endnote

%0 Conference Paper
%T Variance reduction in frequency estimators via control variates method
%A Rameshwar Pratap
%A Raghav Kulkarni
%B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2021
%E Cassio de Campos
%E Marloes H. Maathuis	
%F pmlr-v161-pratap21a
%I PMLR
%P 183--193
%U https://proceedings.mlr.press/v161/pratap21a.html
%V 161
%X Generating succinct summaries (also known as sketches) of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch (Cormode and Muthukrishnan, J. Algorithms, 55(1):58–75, 2005) and Count-Sketch (Charikar et al., Theor. Comput. Sci., 312(1):3–15, 2004) are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick (Lavenberg and Welch, Manage. Sci., 27:322–335, 1981) used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.

APA


Pratap, R. & Kulkarni, R.. (2021). Variance reduction in frequency estimators via control variates method. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:183-193 Available from https://proceedings.mlr.press/v161/pratap21a.html.

Variance reduction in frequency estimators via control variates method

Abstract

Cite this Paper

Related Material