Describing Differences between Text Distributions with Natural Language

Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27099-27116, 2022.

Abstract

How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., “is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is \underline{\space\space\space\space}". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhong22a, title = {Describing Differences between Text Distributions with Natural Language}, author = {Zhong, Ruiqi and Snell, Charlie and Klein, Dan and Steinhardt, Jacob}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27099--27116}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhong22a/zhong22a.pdf}, url = {https://proceedings.mlr.press/v162/zhong22a.html}, abstract = {How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., “is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is \underline{\space\space\space\space}". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.} }
Endnote
%0 Conference Paper %T Describing Differences between Text Distributions with Natural Language %A Ruiqi Zhong %A Charlie Snell %A Dan Klein %A Jacob Steinhardt %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhong22a %I PMLR %P 27099--27116 %U https://proceedings.mlr.press/v162/zhong22a.html %V 162 %X How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., “is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is \underline{\space\space\space\space}". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.
APA
Zhong, R., Snell, C., Klein, D. & Steinhardt, J.. (2022). Describing Differences between Text Distributions with Natural Language. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27099-27116 Available from https://proceedings.mlr.press/v162/zhong22a.html.

Related Material