Matrix Information Theory for Self-Supervised Learning

Yifan Zhang; Zhiquan Tan; Jingqin Yang; Weiran Huang; Yang Yuan

Matrix Information Theory for Self-Supervised Learning

Yifan Zhang, Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59897-59918, 2024.

Abstract

The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-zhang24bi,
  title = 	 {Matrix Information Theory for Self-Supervised Learning},
  author =       {Zhang, Yifan and Tan, Zhiquan and Yang, Jingqin and Huang, Weiran and Yuan, Yang},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {59897--59918},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24bi/zhang24bi.pdf},
  url = 	 {https://proceedings.mlr.press/v235/zhang24bi.html},
  abstract = 	 {The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss.}
}

Endnote

%0 Conference Paper
%T Matrix Information Theory for Self-Supervised Learning
%A Yifan Zhang
%A Zhiquan Tan
%A Jingqin Yang
%A Weiran Huang
%A Yang Yuan
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-zhang24bi
%I PMLR
%P 59897--59918
%U https://proceedings.mlr.press/v235/zhang24bi.html
%V 235
%X The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss.

APA


Zhang, Y., Tan, Z., Yang, J., Huang, W. & Yuan, Y.. (2024). Matrix Information Theory for Self-Supervised Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59897-59918 Available from https://proceedings.mlr.press/v235/zhang24bi.html.

Matrix Information Theory for Self-Supervised Learning

Abstract

Cite this Paper

Related Material