LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Yujun Shi; Jun Hao Liew; Hanshu Yan; Vincent Tan; Jiashi Feng

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Tan, Jiashi Feng

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:54981-54993, 2025.

Abstract

Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based framework using Generative Adversarial Networks, and subsequent studies have leveraged large-scale diffusion models. However, these methods often require over a minute per edit and exhibit low success rates. We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. By redefining drag-based editing as a conditional generation task, we eliminate the need for time-consuming latent optimization or gradient-based guidance. Our model is trained on large-scale paired video frames, capturing diverse motion (object translations, pose shifts, zooming, etc.) to significantly improve accuracy and consistency. Despite being trained only on videos, our model generalizes to local deformations beyond the training data (e.g., lengthening hair, twisting rainbows). Extensive evaluations confirm the superiority of our approach, and we will release both code and model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-shi25g,
  title = 	 {{L}ightning{D}rag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos},
  author =       {Shi, Yujun and Liew, Jun Hao and Yan, Hanshu and Tan, Vincent and Feng, Jiashi},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {54981--54993},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/shi25g/shi25g.pdf},
  url = 	 {https://proceedings.mlr.press/v267/shi25g.html},
  abstract = 	 {Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based framework using Generative Adversarial Networks, and subsequent studies have leveraged large-scale diffusion models. However, these methods often require over a minute per edit and exhibit low success rates. We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. By redefining drag-based editing as a conditional generation task, we eliminate the need for time-consuming latent optimization or gradient-based guidance. Our model is trained on large-scale paired video frames, capturing diverse motion (object translations, pose shifts, zooming, etc.) to significantly improve accuracy and consistency. Despite being trained only on videos, our model generalizes to local deformations beyond the training data (e.g., lengthening hair, twisting rainbows). Extensive evaluations confirm the superiority of our approach, and we will release both code and model.}
}

Endnote

%0 Conference Paper
%T LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
%A Yujun Shi
%A Jun Hao Liew
%A Hanshu Yan
%A Vincent Tan
%A Jiashi Feng
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-shi25g
%I PMLR
%P 54981--54993
%U https://proceedings.mlr.press/v267/shi25g.html
%V 267
%X Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based framework using Generative Adversarial Networks, and subsequent studies have leveraged large-scale diffusion models. However, these methods often require over a minute per edit and exhibit low success rates. We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. By redefining drag-based editing as a conditional generation task, we eliminate the need for time-consuming latent optimization or gradient-based guidance. Our model is trained on large-scale paired video frames, capturing diverse motion (object translations, pose shifts, zooming, etc.) to significantly improve accuracy and consistency. Despite being trained only on videos, our model generalizes to local deformations beyond the training data (e.g., lengthening hair, twisting rainbows). Extensive evaluations confirm the superiority of our approach, and we will release both code and model.

APA

Shi, Y., Liew, J.H., Yan, H., Tan, V. & Feng, J.. (2025). LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:54981-54993 Available from https://proceedings.mlr.press/v267/shi25g.html.

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

Abstract

Cite this Paper

Related Material