Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers

Jyotikrishna Dass, Rabi Mahapatra
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2467-2477, 2021.

Abstract

Least-Mean-Squares (\textsc{LMS}) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example, \citep{nips2019} design a fast and accurate Caratheodory set on input data to boost the performance of existing \textsc{LMS} solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve \textsc{LMS} problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with \citep{nips2019}. Our results show Householder sketch speeds up existing \textsc{LMS} solvers in the scikit-learn library up to $100$x-$400$x. Also, it is $10$x-$100$x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-dass21a, title = {Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers}, author = {Dass, Jyotikrishna and Mahapatra, Rabi}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2467--2477}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/dass21a/dass21a.pdf}, url = {https://proceedings.mlr.press/v139/dass21a.html}, abstract = {Least-Mean-Squares (\textsc{LMS}) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example, \citep{nips2019} design a fast and accurate Caratheodory set on input data to boost the performance of existing \textsc{LMS} solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve \textsc{LMS} problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with \citep{nips2019}. Our results show Householder sketch speeds up existing \textsc{LMS} solvers in the scikit-learn library up to $100$x-$400$x. Also, it is $10$x-$100$x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.} }
Endnote
%0 Conference Paper %T Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers %A Jyotikrishna Dass %A Rabi Mahapatra %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-dass21a %I PMLR %P 2467--2477 %U https://proceedings.mlr.press/v139/dass21a.html %V 139 %X Least-Mean-Squares (\textsc{LMS}) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example, \citep{nips2019} design a fast and accurate Caratheodory set on input data to boost the performance of existing \textsc{LMS} solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve \textsc{LMS} problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with \citep{nips2019}. Our results show Householder sketch speeds up existing \textsc{LMS} solvers in the scikit-learn library up to $100$x-$400$x. Also, it is $10$x-$100$x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.
APA
Dass, J. & Mahapatra, R.. (2021). Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2467-2477 Available from https://proceedings.mlr.press/v139/dass21a.html.

Related Material