Testing Tail Weight of a Distribution Via Hazard Rate

Maryam Aliakbarpour, Amartya Shankha Biswas, Kavya Ravichandran, Ronitt Rubinfeld
Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:34-81, 2023.

Abstract

Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. We study one such problem in the framework of {\em distribution property testing}, characterizing the number of samples required to to distinguish whether a distribution has a certain property or is far from having that property. In particular, given samples from a distribution, we seek to characterize the tail of the distribution, that is, understand how many elements appear infrequently. We develop an algorithm based on a careful bucketing scheme that distinguishes light-tailed distributions from non-light-tailed ones with respect to a definition based on the hazard rate, under natural smoothness and ordering assumptions. We bound the number of samples required for this test to succeed with high probability in terms of the parameters of the problem, showing that it is polynomial in these parameters. Further, we prove a hardness result that implies that this problem cannot be solved without any assumptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v201-aliakbarpour23a, title = {Testing Tail Weight of a Distribution Via Hazard Rate}, author = {Aliakbarpour, Maryam and Biswas, Amartya Shankha and Ravichandran, Kavya and Rubinfeld, Ronitt}, booktitle = {Proceedings of The 34th International Conference on Algorithmic Learning Theory}, pages = {34--81}, year = {2023}, editor = {Agrawal, Shipra and Orabona, Francesco}, volume = {201}, series = {Proceedings of Machine Learning Research}, month = {20 Feb--23 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v201/aliakbarpour23a/aliakbarpour23a.pdf}, url = {https://proceedings.mlr.press/v201/aliakbarpour23a.html}, abstract = {Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. We study one such problem in the framework of {\em distribution property testing}, characterizing the number of samples required to to distinguish whether a distribution has a certain property or is far from having that property. In particular, given samples from a distribution, we seek to characterize the tail of the distribution, that is, understand how many elements appear infrequently. We develop an algorithm based on a careful bucketing scheme that distinguishes light-tailed distributions from non-light-tailed ones with respect to a definition based on the hazard rate, under natural smoothness and ordering assumptions. We bound the number of samples required for this test to succeed with high probability in terms of the parameters of the problem, showing that it is polynomial in these parameters. Further, we prove a hardness result that implies that this problem cannot be solved without any assumptions. } }
Endnote
%0 Conference Paper %T Testing Tail Weight of a Distribution Via Hazard Rate %A Maryam Aliakbarpour %A Amartya Shankha Biswas %A Kavya Ravichandran %A Ronitt Rubinfeld %B Proceedings of The 34th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Shipra Agrawal %E Francesco Orabona %F pmlr-v201-aliakbarpour23a %I PMLR %P 34--81 %U https://proceedings.mlr.press/v201/aliakbarpour23a.html %V 201 %X Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. We study one such problem in the framework of {\em distribution property testing}, characterizing the number of samples required to to distinguish whether a distribution has a certain property or is far from having that property. In particular, given samples from a distribution, we seek to characterize the tail of the distribution, that is, understand how many elements appear infrequently. We develop an algorithm based on a careful bucketing scheme that distinguishes light-tailed distributions from non-light-tailed ones with respect to a definition based on the hazard rate, under natural smoothness and ordering assumptions. We bound the number of samples required for this test to succeed with high probability in terms of the parameters of the problem, showing that it is polynomial in these parameters. Further, we prove a hardness result that implies that this problem cannot be solved without any assumptions.
APA
Aliakbarpour, M., Biswas, A.S., Ravichandran, K. & Rubinfeld, R.. (2023). Testing Tail Weight of a Distribution Via Hazard Rate. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:34-81 Available from https://proceedings.mlr.press/v201/aliakbarpour23a.html.

Related Material