Improved Representation Learning Through Tensorized Autoencoders

Pascal Esser; Satyaki Mukherjee; Mahalakshmi Sabanayagam; Debarghya Ghoshdastidar

Improved Representation Learning Through Tensorized Autoencoders

Pascal Esser, Satyaki Mukherjee, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8294-8307, 2023.

Abstract

The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validate this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-esser23a,
  title = 	 {Improved Representation Learning Through Tensorized Autoencoders},
  author =       {Esser, Pascal and Mukherjee, Satyaki and Sabanayagam, Mahalakshmi and Ghoshdastidar, Debarghya},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {8294--8307},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/esser23a/esser23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/esser23a.html},
  abstract = 	 {The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validate this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.}
}

Endnote

%0 Conference Paper
%T Improved Representation Learning Through Tensorized Autoencoders
%A Pascal Esser
%A Satyaki Mukherjee
%A Mahalakshmi Sabanayagam
%A Debarghya Ghoshdastidar
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-esser23a
%I PMLR
%P 8294--8307
%U https://proceedings.mlr.press/v206/esser23a.html
%V 206
%X The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validate this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.

APA

Esser, P., Mukherjee, S., Sabanayagam, M. & Ghoshdastidar, D.. (2023). Improved Representation Learning Through Tensorized Autoencoders. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8294-8307 Available from https://proceedings.mlr.press/v206/esser23a.html.

Improved Representation Learning Through Tensorized Autoencoders

Abstract

Cite this Paper

Related Material