Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs

Jon Parker; Hans Engler

Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs

Jon Parker, Hans Engler

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:743-750, 2014.

Abstract

Sampling from a probability mass function (PMF) has many applications in modern computing. This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from. All samples are drawn from Q as opposed to the original input distribution P. We say that Q “spoofs” P while this switch is difficult to detect with a statistical test. The lifetime of Q is the sample size required to detect the switch from P to Q. We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted. These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-parker14,
  title = 	 {{Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs}},
  author = 	 {Parker, Jon and Engler, Hans},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {743--750},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/parker14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/parker14.html},
  abstract = 	 {Sampling from a probability mass function (PMF) has many applications in modern computing.  This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling.  This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from.  All samples are drawn from Q as opposed to the original input distribution P.  We say that Q “spoofs” P while this switch is difficult to detect with a statistical test.  The lifetime of Q is the sample size required to detect the switch from P to Q.  We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted.  These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.}
}

Endnote

%0 Conference Paper
%T Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs
%A Jon Parker
%A Hans Engler
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-parker14
%I PMLR
%P 743--750
%U https://proceedings.mlr.press/v33/parker14.html
%V 33
%X Sampling from a probability mass function (PMF) has many applications in modern computing.  This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling.  This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from.  All samples are drawn from Q as opposed to the original input distribution P.  We say that Q “spoofs” P while this switch is difficult to detect with a statistical test.  The lifetime of Q is the sample size required to detect the switch from P to Q.  We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted.  These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.

RIS


TY  - CPAPER
TI  - Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs
AU  - Jon Parker
AU  - Hans Engler
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-parker14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 743
EP  - 750
L1  - http://proceedings.mlr.press/v33/parker14.pdf
UR  - https://proceedings.mlr.press/v33/parker14.html
AB  - Sampling from a probability mass function (PMF) has many applications in modern computing.  This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling.  This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from.  All samples are drawn from Q as opposed to the original input distribution P.  We say that Q “spoofs” P while this switch is difficult to detect with a statistical test.  The lifetime of Q is the sample size required to detect the switch from P to Q.  We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted.  These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.
ER  -

APA


Parker, J. & Engler, H.. (2014). Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:743-750 Available from https://proceedings.mlr.press/v33/parker14.html.

Related Material

Download PDF