Universal Video Face Restoration Method Based on Vision-Language Model

Yipiao Xu; Zhenbo Song; Jianfeng Lu

Universal Video Face Restoration Method Based on Vision-Language Model

Yipiao Xu, Zhenbo Song, Jianfeng Lu

Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:367-382, 2025.

Abstract

The video face restoration aims to restore high-quality face video from low-quality face video, but most existing methods typically focus on specific and single degradation scene such as denoising or deblurring. However, the universal video face restoration should restore face video in various degradation scenes. In this paper, we use language prompt which describes the face information including gender, appearance and expression to guide video face restoration. To enhance the applicability, we remove the language prompt by ControlNet and incorporate the human-level knowledge from vision-language models into general networks to improve the video face restoration performance and enable the universal video face restoration. In addition, we construct a degradation dataset, which contains multiple degradations in the same scene and captions which describe the face information. Our extensive experiments show that our approach achieves highly competitive performance in universal video face restoration.

Cite this Paper

BibTeX

@InProceedings{pmlr-v260-xu25a,
  title = 	 {Universal Video Face Restoration Method Based on Vision-Language Model},
  author =       {Xu, Yipiao and Song, Zhenbo and Lu, Jianfeng},
  booktitle = 	 {Proceedings of the 16th Asian Conference on Machine Learning},
  pages = 	 {367--382},
  year = 	 {2025},
  editor = 	 {Nguyen, Vu and Lin, Hsuan-Tien},
  volume = 	 {260},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--08 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v260/main/assets/xu25a/xu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v260/xu25a.html},
  abstract = 	 {The video face restoration aims to restore high-quality face video from low-quality face video, but most existing methods typically focus on specific and single degradation scene such as denoising or deblurring. However, the universal video face restoration should restore face video in various degradation scenes. In this paper, we use language prompt which describes the face information including gender, appearance and expression to guide video face restoration. To enhance the applicability, we remove the language prompt by ControlNet and incorporate the human-level knowledge from vision-language models into general networks to improve the video face restoration performance and enable the universal video face restoration. In addition, we construct a degradation dataset, which contains multiple degradations in the same scene and captions which describe the face information. Our extensive experiments show that our approach achieves highly competitive performance in universal video face restoration.}
}

Endnote

%0 Conference Paper
%T Universal Video Face Restoration Method Based on Vision-Language Model
%A Yipiao Xu
%A Zhenbo Song
%A Jianfeng Lu
%B Proceedings of the 16th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Vu Nguyen
%E Hsuan-Tien Lin	
%F pmlr-v260-xu25a
%I PMLR
%P 367--382
%U https://proceedings.mlr.press/v260/xu25a.html
%V 260
%X The video face restoration aims to restore high-quality face video from low-quality face video, but most existing methods typically focus on specific and single degradation scene such as denoising or deblurring. However, the universal video face restoration should restore face video in various degradation scenes. In this paper, we use language prompt which describes the face information including gender, appearance and expression to guide video face restoration. To enhance the applicability, we remove the language prompt by ControlNet and incorporate the human-level knowledge from vision-language models into general networks to improve the video face restoration performance and enable the universal video face restoration. In addition, we construct a degradation dataset, which contains multiple degradations in the same scene and captions which describe the face information. Our extensive experiments show that our approach achieves highly competitive performance in universal video face restoration.

APA

Xu, Y., Song, Z. & Lu, J.. (2025). Universal Video Face Restoration Method Based on Vision-Language Model. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:367-382 Available from https://proceedings.mlr.press/v260/xu25a.html.

Universal Video Face Restoration Method Based on Vision-Language Model

Abstract

Cite this Paper

Related Material