Kernel Methods for Radial Transformed Compositional Data with Many Zeros

Junyoung Park, Changwon Yoon, Cheolwoo Park, Jeongyoun Ahn
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:17458-17472, 2022.

Abstract

Compositional data analysis with a high proportion of zeros has gained increasing popularity, especially in chemometrics and human gut microbiomes research. Statistical analyses of this type of data are typically carried out via a log-ratio transformation after replacing zeros with small positive values. We should note, however, that this procedure is geometrically improper, as it causes anomalous distortions through the transformation. We propose a radial transformation that does not require zero substitutions and more importantly results in essential equivalence between domains before and after the transformation. We show that a rich class of kernels on hyperspheres can successfully define a kernel embedding for compositional data based on this equivalence. To the best of our knowledge, this is the first work that theoretically establishes the availability of the extensive library of kernel-based machine learning methods for compositional data. The applicability of the proposed approach is demonstrated with kernel principal component analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-park22d, title = {Kernel Methods for Radial Transformed Compositional Data with Many Zeros}, author = {Park, Junyoung and Yoon, Changwon and Park, Cheolwoo and Ahn, Jeongyoun}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {17458--17472}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/park22d/park22d.pdf}, url = {https://proceedings.mlr.press/v162/park22d.html}, abstract = {Compositional data analysis with a high proportion of zeros has gained increasing popularity, especially in chemometrics and human gut microbiomes research. Statistical analyses of this type of data are typically carried out via a log-ratio transformation after replacing zeros with small positive values. We should note, however, that this procedure is geometrically improper, as it causes anomalous distortions through the transformation. We propose a radial transformation that does not require zero substitutions and more importantly results in essential equivalence between domains before and after the transformation. We show that a rich class of kernels on hyperspheres can successfully define a kernel embedding for compositional data based on this equivalence. To the best of our knowledge, this is the first work that theoretically establishes the availability of the extensive library of kernel-based machine learning methods for compositional data. The applicability of the proposed approach is demonstrated with kernel principal component analysis.} }
Endnote
%0 Conference Paper %T Kernel Methods for Radial Transformed Compositional Data with Many Zeros %A Junyoung Park %A Changwon Yoon %A Cheolwoo Park %A Jeongyoun Ahn %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-park22d %I PMLR %P 17458--17472 %U https://proceedings.mlr.press/v162/park22d.html %V 162 %X Compositional data analysis with a high proportion of zeros has gained increasing popularity, especially in chemometrics and human gut microbiomes research. Statistical analyses of this type of data are typically carried out via a log-ratio transformation after replacing zeros with small positive values. We should note, however, that this procedure is geometrically improper, as it causes anomalous distortions through the transformation. We propose a radial transformation that does not require zero substitutions and more importantly results in essential equivalence between domains before and after the transformation. We show that a rich class of kernels on hyperspheres can successfully define a kernel embedding for compositional data based on this equivalence. To the best of our knowledge, this is the first work that theoretically establishes the availability of the extensive library of kernel-based machine learning methods for compositional data. The applicability of the proposed approach is demonstrated with kernel principal component analysis.
APA
Park, J., Yoon, C., Park, C. & Ahn, J.. (2022). Kernel Methods for Radial Transformed Compositional Data with Many Zeros. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:17458-17472 Available from https://proceedings.mlr.press/v162/park22d.html.

Related Material