Open-Domain Text Evaluation via Contrastive Distribution Methods

Sidi Lu, Hongyi Liu, Asli Celikyilmaz, Tianlu Wang, Nanyun Peng
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33057-33068, 2024.

Abstract

Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the contrast of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) Generative CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) Discriminative CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-lu24f, title = {Open-Domain Text Evaluation via Contrastive Distribution Methods}, author = {Lu, Sidi and Liu, Hongyi and Celikyilmaz, Asli and Wang, Tianlu and Peng, Nanyun}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {33057--33068}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lu24f/lu24f.pdf}, url = {https://proceedings.mlr.press/v235/lu24f.html}, abstract = {Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the contrast of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) Generative CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) Discriminative CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.} }
Endnote
%0 Conference Paper %T Open-Domain Text Evaluation via Contrastive Distribution Methods %A Sidi Lu %A Hongyi Liu %A Asli Celikyilmaz %A Tianlu Wang %A Nanyun Peng %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-lu24f %I PMLR %P 33057--33068 %U https://proceedings.mlr.press/v235/lu24f.html %V 235 %X Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the contrast of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) Generative CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) Discriminative CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.
APA
Lu, S., Liu, H., Celikyilmaz, A., Wang, T. & Peng, N.. (2024). Open-Domain Text Evaluation via Contrastive Distribution Methods. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33057-33068 Available from https://proceedings.mlr.press/v235/lu24f.html.

Related Material