Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati; Guy Tennenholtz; Chihwei Hsu; Moonkyung Ryu; Deepak Ramachandran; Yinlam Chow; Xiang Li; Craig Boutilier

Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati, Guy Tennenholtz, Chihwei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45362-45394, 2025.

Abstract

We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-nabati25a,
  title = 	 {Preference Adaptive and Sequential Text-to-Image Generation},
  author =       {Nabati, Ofir and Tennenholtz, Guy and Hsu, Chihwei and Ryu, Moonkyung and Ramachandran, Deepak and Chow, Yinlam and Li, Xiang and Boutilier, Craig},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {45362--45394},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nabati25a/nabati25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/nabati25a.html},
  abstract = 	 {We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.}
}

Endnote

%0 Conference Paper
%T Preference Adaptive and Sequential Text-to-Image Generation
%A Ofir Nabati
%A Guy Tennenholtz
%A Chihwei Hsu
%A Moonkyung Ryu
%A Deepak Ramachandran
%A Yinlam Chow
%A Xiang Li
%A Craig Boutilier
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-nabati25a
%I PMLR
%P 45362--45394
%U https://proceedings.mlr.press/v267/nabati25a.html
%V 267
%X We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.

APA

Nabati, O., Tennenholtz, G., Hsu, C., Ryu, M., Ramachandran, D., Chow, Y., Li, X. & Boutilier, C.. (2025). Preference Adaptive and Sequential Text-to-Image Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:45362-45394 Available from https://proceedings.mlr.press/v267/nabati25a.html.

Preference Adaptive and Sequential Text-to-Image Generation

Abstract

Cite this Paper

Related Material