Effective and Efficient Masked Image Generation Models

Zebin You, Jingyang Ou, Xiaolu Zhang, Jun Hu, Jun Zhou, Chongxuan Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:72730-72746, 2025.

Abstract

Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fréchet Inception Distance (FID). In particular, on ImageNet $256\times256$, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion model REPA while requiring less than 45% of the NFE. Additionally, on ImageNet $512\times512$, eMIGM outperforms the strong continuous diffusion model EDM2. Code is available at https://github.com/ML-GSAI/eMIGM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-you25d, title = {Effective and Efficient Masked Image Generation Models}, author = {You, Zebin and Ou, Jingyang and Zhang, Xiaolu and Hu, Jun and Zhou, Jun and Li, Chongxuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {72730--72746}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/you25d/you25d.pdf}, url = {https://proceedings.mlr.press/v267/you25d.html}, abstract = {Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fréchet Inception Distance (FID). In particular, on ImageNet $256\times256$, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion model REPA while requiring less than 45% of the NFE. Additionally, on ImageNet $512\times512$, eMIGM outperforms the strong continuous diffusion model EDM2. Code is available at https://github.com/ML-GSAI/eMIGM.} }
Endnote
%0 Conference Paper %T Effective and Efficient Masked Image Generation Models %A Zebin You %A Jingyang Ou %A Xiaolu Zhang %A Jun Hu %A Jun Zhou %A Chongxuan Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-you25d %I PMLR %P 72730--72746 %U https://proceedings.mlr.press/v267/you25d.html %V 267 %X Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fréchet Inception Distance (FID). In particular, on ImageNet $256\times256$, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion model REPA while requiring less than 45% of the NFE. Additionally, on ImageNet $512\times512$, eMIGM outperforms the strong continuous diffusion model EDM2. Code is available at https://github.com/ML-GSAI/eMIGM.
APA
You, Z., Ou, J., Zhang, X., Hu, J., Zhou, J. & Li, C.. (2025). Effective and Efficient Masked Image Generation Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:72730-72746 Available from https://proceedings.mlr.press/v267/you25d.html.

Related Material