Supervised Quantile Normalization for Low Rank Matrix Factorization

Marco Cuturi, Olivier Teboul, Jonathan Niles-Weed, Jean-Philippe Vert
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2269-2279, 2020.

Abstract

Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a $d\times n$ matrix $X$ of $d$ features measured on $n$ individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by Cuturi et al. (2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-cuturi20a, title = {Supervised Quantile Normalization for Low Rank Matrix Factorization}, author = {Cuturi, Marco and Teboul, Olivier and Niles-Weed, Jonathan and Vert, Jean-Philippe}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2269--2279}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/cuturi20a/cuturi20a.pdf}, url = {https://proceedings.mlr.press/v119/cuturi20a.html}, abstract = {Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a $d\times n$ matrix $X$ of $d$ features measured on $n$ individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by Cuturi et al. (2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.} }
Endnote
%0 Conference Paper %T Supervised Quantile Normalization for Low Rank Matrix Factorization %A Marco Cuturi %A Olivier Teboul %A Jonathan Niles-Weed %A Jean-Philippe Vert %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-cuturi20a %I PMLR %P 2269--2279 %U https://proceedings.mlr.press/v119/cuturi20a.html %V 119 %X Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a $d\times n$ matrix $X$ of $d$ features measured on $n$ individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by Cuturi et al. (2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.
APA
Cuturi, M., Teboul, O., Niles-Weed, J. & Vert, J.. (2020). Supervised Quantile Normalization for Low Rank Matrix Factorization. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2269-2279 Available from https://proceedings.mlr.press/v119/cuturi20a.html.

Related Material