Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Ardavan Saeedi, Matthew Hoffman, Stephen DiVerdi, Asma Ghandeharioun, Matthew Johnson, Ryan Adams
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1309-1317, 2018.

Abstract

Professional-grade software applications are powerful but complicated – expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no single "correct" edit for a given image – different experts may make very different aesthetic decisions when faced with the same image, and a single expert may make different choices depending on the intended use of the image (or on a whim). We therefore want a system that can propose multiple diverse, high-quality edits while also learning from and adapting to a user’s aesthetic preferences. In this work, we develop a statistical model that meets these objectives. Our model builds on recent advances in neural network generative modeling and scalable inference, and uses hierarchical structure to learn editing patterns across many diverse users. Empirically, we find that our model outperforms other approaches on this challenging multimodal prediction task.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-saeedi18a, title = {Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models}, author = {Saeedi, Ardavan and Hoffman, Matthew and DiVerdi, Stephen and Ghandeharioun, Asma and Johnson, Matthew and Adams, Ryan}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1309--1317}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/saeedi18a/saeedi18a.pdf}, url = {https://proceedings.mlr.press/v84/saeedi18a.html}, abstract = {Professional-grade software applications are powerful but complicated – expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no single "correct" edit for a given image – different experts may make very different aesthetic decisions when faced with the same image, and a single expert may make different choices depending on the intended use of the image (or on a whim). We therefore want a system that can propose multiple diverse, high-quality edits while also learning from and adapting to a user’s aesthetic preferences. In this work, we develop a statistical model that meets these objectives. Our model builds on recent advances in neural network generative modeling and scalable inference, and uses hierarchical structure to learn editing patterns across many diverse users. Empirically, we find that our model outperforms other approaches on this challenging multimodal prediction task.} }
Endnote
%0 Conference Paper %T Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models %A Ardavan Saeedi %A Matthew Hoffman %A Stephen DiVerdi %A Asma Ghandeharioun %A Matthew Johnson %A Ryan Adams %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-saeedi18a %I PMLR %P 1309--1317 %U https://proceedings.mlr.press/v84/saeedi18a.html %V 84 %X Professional-grade software applications are powerful but complicated – expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no single "correct" edit for a given image – different experts may make very different aesthetic decisions when faced with the same image, and a single expert may make different choices depending on the intended use of the image (or on a whim). We therefore want a system that can propose multiple diverse, high-quality edits while also learning from and adapting to a user’s aesthetic preferences. In this work, we develop a statistical model that meets these objectives. Our model builds on recent advances in neural network generative modeling and scalable inference, and uses hierarchical structure to learn editing patterns across many diverse users. Empirically, we find that our model outperforms other approaches on this challenging multimodal prediction task.
APA
Saeedi, A., Hoffman, M., DiVerdi, S., Ghandeharioun, A., Johnson, M. & Adams, R.. (2018). Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1309-1317 Available from https://proceedings.mlr.press/v84/saeedi18a.html.

Related Material