Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation

Ilan Naiman, Nimrod Berman, Omri Azencot
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25694-25717, 2023.

Abstract

Unsupervised disentanglement is a long-standing challenge in representation learning. Recently, self-supervised techniques achieved impressive results in the sequential setting, where data is time-dependent. However, the latter methods employ modality-based data augmentations and random sampling or solve auxiliary tasks. In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model. Unlike existing work, we introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals, while using common batch sizes and samples from the latent space itself. In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data. We evaluate our approach on video, audio, and time series benchmarks. Our method presents state-of-the-art results in comparison to existing techniques. The code is available at https://github.com/azencot-group/SPYL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-naiman23a, title = {Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation}, author = {Naiman, Ilan and Berman, Nimrod and Azencot, Omri}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {25694--25717}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/naiman23a/naiman23a.pdf}, url = {https://proceedings.mlr.press/v202/naiman23a.html}, abstract = {Unsupervised disentanglement is a long-standing challenge in representation learning. Recently, self-supervised techniques achieved impressive results in the sequential setting, where data is time-dependent. However, the latter methods employ modality-based data augmentations and random sampling or solve auxiliary tasks. In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model. Unlike existing work, we introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals, while using common batch sizes and samples from the latent space itself. In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data. We evaluate our approach on video, audio, and time series benchmarks. Our method presents state-of-the-art results in comparison to existing techniques. The code is available at https://github.com/azencot-group/SPYL.} }
Endnote
%0 Conference Paper %T Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation %A Ilan Naiman %A Nimrod Berman %A Omri Azencot %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-naiman23a %I PMLR %P 25694--25717 %U https://proceedings.mlr.press/v202/naiman23a.html %V 202 %X Unsupervised disentanglement is a long-standing challenge in representation learning. Recently, self-supervised techniques achieved impressive results in the sequential setting, where data is time-dependent. However, the latter methods employ modality-based data augmentations and random sampling or solve auxiliary tasks. In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model. Unlike existing work, we introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals, while using common batch sizes and samples from the latent space itself. In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data. We evaluate our approach on video, audio, and time series benchmarks. Our method presents state-of-the-art results in comparison to existing techniques. The code is available at https://github.com/azencot-group/SPYL.
APA
Naiman, I., Berman, N. & Azencot, O.. (2023). Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:25694-25717 Available from https://proceedings.mlr.press/v202/naiman23a.html.

Related Material