PAC Mode Estimation using PPR Martingale Confidence Sequences

Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet J. Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, Shivaram Kalyanakrishnan
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5815-5852, 2022.

Abstract

We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is handled very well by prior-posterior-ratio (PPR) martingale confidence sequences (Waudby-Smith and Ramdas, 2020), we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to 0). PPR-1v1 is simple and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-anand-jain22a, title = { PAC Mode Estimation using PPR Martingale Confidence Sequences }, author = {Anand Jain, Shubham and Shah, Rohan and Gupta, Sanit and Mehta, Denil and Nair, Inderjeet J. and Vora, Jian and Khyalia, Sushil and Das, Sourav and Ribeiro, Vinay J. and Kalyanakrishnan, Shivaram}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {5815--5852}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/anand-jain22a/anand-jain22a.pdf}, url = {https://proceedings.mlr.press/v151/anand-jain22a.html}, abstract = { We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is handled very well by prior-posterior-ratio (PPR) martingale confidence sequences (Waudby-Smith and Ramdas, 2020), we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to 0). PPR-1v1 is simple and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains. } }
Endnote
%0 Conference Paper %T PAC Mode Estimation using PPR Martingale Confidence Sequences %A Shubham Anand Jain %A Rohan Shah %A Sanit Gupta %A Denil Mehta %A Inderjeet J. Nair %A Jian Vora %A Sushil Khyalia %A Sourav Das %A Vinay J. Ribeiro %A Shivaram Kalyanakrishnan %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-anand-jain22a %I PMLR %P 5815--5852 %U https://proceedings.mlr.press/v151/anand-jain22a.html %V 151 %X We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is handled very well by prior-posterior-ratio (PPR) martingale confidence sequences (Waudby-Smith and Ramdas, 2020), we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to 0). PPR-1v1 is simple and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains.
APA
Anand Jain, S., Shah, R., Gupta, S., Mehta, D., Nair, I.J., Vora, J., Khyalia, S., Das, S., Ribeiro, V.J. & Kalyanakrishnan, S.. (2022). PAC Mode Estimation using PPR Martingale Confidence Sequences . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:5815-5852 Available from https://proceedings.mlr.press/v151/anand-jain22a.html.

Related Material