Online estimation of similarity matrices with incomplete data

Fangchen Yu, Yicheng Zeng, Jianfeng Mao, Wenye Li
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2454-2464, 2023.

Abstract

The similarity matrix measures pairwise similarities between a set of data points and is an essential concept in data processing, routinely used in practical applications. Obtaining a similarity matrix is typically straightforward when data points are completely observed. However, incomplete observations can make it challenging to obtain a high-quality similarity matrix, which becomes even more complex in online data. To address this challenge, we propose matrix correction algorithms that leverage the positive semi-definiteness (PSD) of the similarity matrix to improve similarity estimation in both offline and online scenarios. Our approaches have a solid theoretical guarantee of performance and excellent potential for parallel execution on large-scale data. Empirical evaluations demonstrate their high effectiveness and efficiency with significantly improved results over classical imputation-based methods, benefiting downstream applications with superior performance. Our code is available at \url{https://github.com/CUHKSZ-Yu/OnMC}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-yu23a, title = {Online estimation of similarity matrices with incomplete data}, author = {Yu, Fangchen and Zeng, Yicheng and Mao, Jianfeng and Li, Wenye}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {2454--2464}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/yu23a/yu23a.pdf}, url = {https://proceedings.mlr.press/v216/yu23a.html}, abstract = {The similarity matrix measures pairwise similarities between a set of data points and is an essential concept in data processing, routinely used in practical applications. Obtaining a similarity matrix is typically straightforward when data points are completely observed. However, incomplete observations can make it challenging to obtain a high-quality similarity matrix, which becomes even more complex in online data. To address this challenge, we propose matrix correction algorithms that leverage the positive semi-definiteness (PSD) of the similarity matrix to improve similarity estimation in both offline and online scenarios. Our approaches have a solid theoretical guarantee of performance and excellent potential for parallel execution on large-scale data. Empirical evaluations demonstrate their high effectiveness and efficiency with significantly improved results over classical imputation-based methods, benefiting downstream applications with superior performance. Our code is available at \url{https://github.com/CUHKSZ-Yu/OnMC}.} }
Endnote
%0 Conference Paper %T Online estimation of similarity matrices with incomplete data %A Fangchen Yu %A Yicheng Zeng %A Jianfeng Mao %A Wenye Li %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-yu23a %I PMLR %P 2454--2464 %U https://proceedings.mlr.press/v216/yu23a.html %V 216 %X The similarity matrix measures pairwise similarities between a set of data points and is an essential concept in data processing, routinely used in practical applications. Obtaining a similarity matrix is typically straightforward when data points are completely observed. However, incomplete observations can make it challenging to obtain a high-quality similarity matrix, which becomes even more complex in online data. To address this challenge, we propose matrix correction algorithms that leverage the positive semi-definiteness (PSD) of the similarity matrix to improve similarity estimation in both offline and online scenarios. Our approaches have a solid theoretical guarantee of performance and excellent potential for parallel execution on large-scale data. Empirical evaluations demonstrate their high effectiveness and efficiency with significantly improved results over classical imputation-based methods, benefiting downstream applications with superior performance. Our code is available at \url{https://github.com/CUHKSZ-Yu/OnMC}.
APA
Yu, F., Zeng, Y., Mao, J. & Li, W.. (2023). Online estimation of similarity matrices with incomplete data. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2454-2464 Available from https://proceedings.mlr.press/v216/yu23a.html.

Related Material