[edit]
MIRROR: Make Your Object-Level Multi-View Generation More Consistent with Training-Free Rectification
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:68900-68927, 2025.
Abstract
Multi-view Diffusion has greatly advanced the development of 3D content creation by generating multiple images from distinct views, achieving remarkable photorealistic results. However, existing works are still vulnerable to inconsistent 3D geometric structures (commonly known as Janus Problem) and severe artifacts. In this paper, we introduce MIRROR, a versatile plug-and-play method that rectifies such inconsistencies in a training-free manner, enabling the acquisition of high-fidelity, realistic structures without compromising diversity. Our key idea focuses on tracing the motion trajectory of physical points across adjacent viewpoints, enabling rectifications based on neighboring observations of the same region. Technically, MIRROR comprises two core modules: Trajectory Tracking Module (TTM) for pixel-wise trajectory tracking that labels identical points across views, and Feature Rectification Module (FRM) for explicitly adjustment of each pixel embedding on noisy synthesized images by minimizing the distance to corresponding block features in neighboring views, thereby achieving consistent outputs. Extensive evaluations demonstrate that MIRROR can seamlessly integrate with a diverse range of off-the-shelf object-level multi-view diffusion models, significantly enhancing both the consistency and the fidelity in an efficient way.