[edit]
Open-Domain Text Evaluation via Contrastive Distribution Methods
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33057-33068, 2024.
Abstract
Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the contrast of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) Generative CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) Discriminative CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.