Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ping Ma; Xinlian Zhang; Xin Xing; Jingyi Ma; Michael Mahoney

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, Michael Mahoney

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1026-1035, 2020.

Abstract

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v108-ma20b,
  title = 	 {Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms},
  author =       {Ma, Ping and Zhang, Xinlian and Xing, Xin and Ma, Jingyi and Mahoney, Michael},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1026--1035},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/ma20b/ma20b.pdf},
  url = 	 {https://proceedings.mlr.press/v108/ma20b.html},
  abstract = 	 {The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators.  However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking.  In this article, we develop asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem.  In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities.  The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters.  For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions.  Moreover, the sampling estimator is asymptotically unbiased in both settings.  Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities.  Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator.  Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.}
}

Endnote

%0 Conference Paper
%T Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
%A Ping Ma
%A Xinlian Zhang
%A Xin Xing
%A Jingyi Ma
%A Michael Mahoney
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-ma20b
%I PMLR
%P 1026--1035
%U https://proceedings.mlr.press/v108/ma20b.html
%V 108
%X The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators.  However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking.  In this article, we develop asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem.  In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities.  The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters.  For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions.  Moreover, the sampling estimator is asymptotically unbiased in both settings.  Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities.  Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator.  Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.

APA

Ma, P., Zhang, X., Xing, X., Ma, J. & Mahoney, M.. (2020). Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1026-1035 Available from https://proceedings.mlr.press/v108/ma20b.html.

Related Material

Download PDF