The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations

Yoanna Turura, Sam Freesun Friedman, Aurora Cremer, Mahnaz Maddah, Sana Tonekaboni
Proceedings of the sixth Conference on Health, Inference, and Learning, PMLR 287:708-719, 2025.

Abstract

Self-supervised representation learning is a powerful approach for extracting meaningful features without relying on large amounts of labeled data, making it particularly valuable in fields like healthcare. This enables pretrained models to be shared and fine-tuned with minimal data for various downstream applications. However, evaluating the quality and behavior of these representations remains challenging. To address this, we introduce Latentverse, an open-source library and web-based platform for evaluating latent representations. Latentverse generates detailed reports with visualizations and metrics that provide a comprehensive perspective on different properties of representations, such as clustering, disentanglement, generalization, expressiveness, and robustness. It also allows for the comparison of different representations, enabling developers to refine model architectures and helping users assess how well an embedding model aligns with the requirements of their specific applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v287-turura25a, title = {The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations}, author = {Turura, Yoanna and Friedman, Sam Freesun and Cremer, Aurora and Maddah, Mahnaz and Tonekaboni, Sana}, booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning}, pages = {708--719}, year = {2025}, editor = {Xu, Xuhai Orson and Choi, Edward and Singhal, Pankhuri and Gerych, Walter and Tang, Shengpu and Agrawal, Monica and Subbaswamy, Adarsh and Sizikova, Elena and Dunn, Jessilyn and Daneshjou, Roxana and Sarker, Tasmie and McDermott, Matthew and Chen, Irene}, volume = {287}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v287/main/assets/turura25a/turura25a.pdf}, url = {https://proceedings.mlr.press/v287/turura25a.html}, abstract = {Self-supervised representation learning is a powerful approach for extracting meaningful features without relying on large amounts of labeled data, making it particularly valuable in fields like healthcare. This enables pretrained models to be shared and fine-tuned with minimal data for various downstream applications. However, evaluating the quality and behavior of these representations remains challenging. To address this, we introduce Latentverse, an open-source library and web-based platform for evaluating latent representations. Latentverse generates detailed reports with visualizations and metrics that provide a comprehensive perspective on different properties of representations, such as clustering, disentanglement, generalization, expressiveness, and robustness. It also allows for the comparison of different representations, enabling developers to refine model architectures and helping users assess how well an embedding model aligns with the requirements of their specific applications.} }
Endnote
%0 Conference Paper %T The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations %A Yoanna Turura %A Sam Freesun Friedman %A Aurora Cremer %A Mahnaz Maddah %A Sana Tonekaboni %B Proceedings of the sixth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2025 %E Xuhai Orson Xu %E Edward Choi %E Pankhuri Singhal %E Walter Gerych %E Shengpu Tang %E Monica Agrawal %E Adarsh Subbaswamy %E Elena Sizikova %E Jessilyn Dunn %E Roxana Daneshjou %E Tasmie Sarker %E Matthew McDermott %E Irene Chen %F pmlr-v287-turura25a %I PMLR %P 708--719 %U https://proceedings.mlr.press/v287/turura25a.html %V 287 %X Self-supervised representation learning is a powerful approach for extracting meaningful features without relying on large amounts of labeled data, making it particularly valuable in fields like healthcare. This enables pretrained models to be shared and fine-tuned with minimal data for various downstream applications. However, evaluating the quality and behavior of these representations remains challenging. To address this, we introduce Latentverse, an open-source library and web-based platform for evaluating latent representations. Latentverse generates detailed reports with visualizations and metrics that provide a comprehensive perspective on different properties of representations, such as clustering, disentanglement, generalization, expressiveness, and robustness. It also allows for the comparison of different representations, enabling developers to refine model architectures and helping users assess how well an embedding model aligns with the requirements of their specific applications.
APA
Turura, Y., Friedman, S.F., Cremer, A., Maddah, M. & Tonekaboni, S.. (2025). The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations. Proceedings of the sixth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 287:708-719 Available from https://proceedings.mlr.press/v287/turura25a.html.

Related Material