Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN

Qiucheng Wu, Yifan Jiang, Junru Wu, Kai Wang, Eric Zhang, Humphrey Shi, Zhangyang Wang, Shiyu Chang
Conference on Parsimony and Learning, PMLR 234:108-133, 2024.

Abstract

The disentanglement of StyleGAN latent space has paved the way for realistic and controllable image editing, but does StyleGAN know anything about temporal motion, as it was only trained on static images? To study the motion features in the latent space of StyleGAN, in this paper, we hypothesize and demonstrate that a series of meaningful, natural, and versatile small, local movements (referred to as "micromotion", such as expression, head movement, and aging effect) can be represented in low-rank spaces extracted from the latent space of a conventionally pre-trained StyleGAN-v2 model for face generation, with the guidance of proper "anchors" in the form of either short text or video clips. Starting from one target face image, with the editing direction decoded from the low-rank space, its micromotion features can be represented as simple as an affine transformation over its latent feature. Perhaps more surprisingly, such micromotion subspace, even learned from just single target face, can be painlessly transferred to other unseen face images, even those from vastly different domains (such as oil painting, cartoon, and sculpture faces). It demonstrates that the local feature geometry corresponding to one type of micromotion is aligned across different face subjects, and hence that StyleGAN-v2 is indeed “secretly” aware of the subject-disentangled feature variations caused by that micromotion. As an application, we present various successful examples of applying our low-dimensional micromotion subspace technique to directly and effortlessly manipulate faces. Compared with previous editing methods, our framework shows high robustness, low computational overhead, and impressive domain transferability. Our code is publicly available at https://github.com/wuqiuche/micromotion-StyleGAN.

Cite this Paper


BibTeX
@InProceedings{pmlr-v234-wu24a, title = {Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN}, author = {Wu, Qiucheng and Jiang, Yifan and Wu, Junru and Wang, Kai and Zhang, Eric and Shi, Humphrey and Wang, Zhangyang and Chang, Shiyu}, booktitle = {Conference on Parsimony and Learning}, pages = {108--133}, year = {2024}, editor = {Chi, Yuejie and Dziugaite, Gintare Karolina and Qu, Qing and Wang, Atlas Wang and Zhu, Zhihui}, volume = {234}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v234/wu24a/wu24a.pdf}, url = {https://proceedings.mlr.press/v234/wu24a.html}, abstract = {The disentanglement of StyleGAN latent space has paved the way for realistic and controllable image editing, but does StyleGAN know anything about temporal motion, as it was only trained on static images? To study the motion features in the latent space of StyleGAN, in this paper, we hypothesize and demonstrate that a series of meaningful, natural, and versatile small, local movements (referred to as "micromotion", such as expression, head movement, and aging effect) can be represented in low-rank spaces extracted from the latent space of a conventionally pre-trained StyleGAN-v2 model for face generation, with the guidance of proper "anchors" in the form of either short text or video clips. Starting from one target face image, with the editing direction decoded from the low-rank space, its micromotion features can be represented as simple as an affine transformation over its latent feature. Perhaps more surprisingly, such micromotion subspace, even learned from just single target face, can be painlessly transferred to other unseen face images, even those from vastly different domains (such as oil painting, cartoon, and sculpture faces). It demonstrates that the local feature geometry corresponding to one type of micromotion is aligned across different face subjects, and hence that StyleGAN-v2 is indeed “secretly” aware of the subject-disentangled feature variations caused by that micromotion. As an application, we present various successful examples of applying our low-dimensional micromotion subspace technique to directly and effortlessly manipulate faces. Compared with previous editing methods, our framework shows high robustness, low computational overhead, and impressive domain transferability. Our code is publicly available at https://github.com/wuqiuche/micromotion-StyleGAN.} }
Endnote
%0 Conference Paper %T Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN %A Qiucheng Wu %A Yifan Jiang %A Junru Wu %A Kai Wang %A Eric Zhang %A Humphrey Shi %A Zhangyang Wang %A Shiyu Chang %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2024 %E Yuejie Chi %E Gintare Karolina Dziugaite %E Qing Qu %E Atlas Wang Wang %E Zhihui Zhu %F pmlr-v234-wu24a %I PMLR %P 108--133 %U https://proceedings.mlr.press/v234/wu24a.html %V 234 %X The disentanglement of StyleGAN latent space has paved the way for realistic and controllable image editing, but does StyleGAN know anything about temporal motion, as it was only trained on static images? To study the motion features in the latent space of StyleGAN, in this paper, we hypothesize and demonstrate that a series of meaningful, natural, and versatile small, local movements (referred to as "micromotion", such as expression, head movement, and aging effect) can be represented in low-rank spaces extracted from the latent space of a conventionally pre-trained StyleGAN-v2 model for face generation, with the guidance of proper "anchors" in the form of either short text or video clips. Starting from one target face image, with the editing direction decoded from the low-rank space, its micromotion features can be represented as simple as an affine transformation over its latent feature. Perhaps more surprisingly, such micromotion subspace, even learned from just single target face, can be painlessly transferred to other unseen face images, even those from vastly different domains (such as oil painting, cartoon, and sculpture faces). It demonstrates that the local feature geometry corresponding to one type of micromotion is aligned across different face subjects, and hence that StyleGAN-v2 is indeed “secretly” aware of the subject-disentangled feature variations caused by that micromotion. As an application, we present various successful examples of applying our low-dimensional micromotion subspace technique to directly and effortlessly manipulate faces. Compared with previous editing methods, our framework shows high robustness, low computational overhead, and impressive domain transferability. Our code is publicly available at https://github.com/wuqiuche/micromotion-StyleGAN.
APA
Wu, Q., Jiang, Y., Wu, J., Wang, K., Zhang, E., Shi, H., Wang, Z. & Chang, S.. (2024). Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 234:108-133 Available from https://proceedings.mlr.press/v234/wu24a.html.

Related Material