A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, Hrishikesh Khandeparkar
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5628-5637, 2019.

Abstract

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically “similar" data points and “negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-saunshi19a, title = {A Theoretical Analysis of Contrastive Unsupervised Representation Learning}, author = {Saunshi, Nikunj and Plevrakis, Orestis and Arora, Sanjeev and Khodak, Mikhail and Khandeparkar, Hrishikesh}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5628--5637}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/saunshi19a/saunshi19a.pdf}, url = {https://proceedings.mlr.press/v97/saunshi19a.html}, abstract = {Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically “similar" data points and “negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.} }
Endnote
%0 Conference Paper %T A Theoretical Analysis of Contrastive Unsupervised Representation Learning %A Nikunj Saunshi %A Orestis Plevrakis %A Sanjeev Arora %A Mikhail Khodak %A Hrishikesh Khandeparkar %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-saunshi19a %I PMLR %P 5628--5637 %U https://proceedings.mlr.press/v97/saunshi19a.html %V 97 %X Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically “similar" data points and “negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.
APA
Saunshi, N., Plevrakis, O., Arora, S., Khodak, M. & Khandeparkar, H.. (2019). A Theoretical Analysis of Contrastive Unsupervised Representation Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5628-5637 Available from https://proceedings.mlr.press/v97/saunshi19a.html.

Related Material