A Generalized Linear Model for Principal Component Analysis of Binary Data

Andrew I. Schein, Lawrence K. Saul, Lyle H. Ungar
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:240-247, 2003.

Abstract

We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. The resulting updates have a simple closed form and are guaranteed at each iteration to improve the model’s likelihood. We evaluate the performance of logistic PCA—as measured by reconstruction error rates—on data sets drawn from four real world applications. In general, we find that logistic PCA is much better suited to modeling binary data than conventional PCA.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-schein03a, title = {A Generalized Linear Model for Principal Component Analysis of Binary Data}, author = {Schein, Andrew I. and Saul, Lawrence K. and Ungar, Lyle H.}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {240--247}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/schein03a/schein03a.pdf}, url = {https://proceedings.mlr.press/r4/schein03a.html}, abstract = {We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. The resulting updates have a simple closed form and are guaranteed at each iteration to improve the model’s likelihood. We evaluate the performance of logistic PCA—as measured by reconstruction error rates—on data sets drawn from four real world applications. In general, we find that logistic PCA is much better suited to modeling binary data than conventional PCA.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T A Generalized Linear Model for Principal Component Analysis of Binary Data %A Andrew I. Schein %A Lawrence K. Saul %A Lyle H. Ungar %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-schein03a %I PMLR %P 240--247 %U https://proceedings.mlr.press/r4/schein03a.html %V R4 %X We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. The resulting updates have a simple closed form and are guaranteed at each iteration to improve the model’s likelihood. We evaluate the performance of logistic PCA—as measured by reconstruction error rates—on data sets drawn from four real world applications. In general, we find that logistic PCA is much better suited to modeling binary data than conventional PCA. %Z Reissued by PMLR on 01 April 2021.
APA
Schein, A.I., Saul, L.K. & Ungar, L.H.. (2003). A Generalized Linear Model for Principal Component Analysis of Binary Data. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:240-247 Available from https://proceedings.mlr.press/r4/schein03a.html. Reissued by PMLR on 01 April 2021.

Related Material