MVDiffusion:實現具備對應感知擴散模型的整體多視角影像生成
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
July 3, 2023
作者: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
cs.AI
摘要
本文提出MVDiffusion——一種針對具備像素級對應關係場景(如全景圖透視裁切或已知幾何資訊的多視角圖像)的簡潔高效多視角圖像生成方法。有別於依賴迭代式圖像變形與修補的既有模型,MVDiffusion通過全局感知機制並行生成所有圖像,兼具高解析度與豐富內容,有效解決了前人模型中常見的誤差累積問題。該方法特別融入對應感知注意力機制,實現高效的跨視角交互。此機制支撐三個關鍵模組:1) 生成模組:在保持全局對應關係的同時生成低解析度圖像;2) 插值模組:對圖像間空間覆蓋進行稠密化處理;3) 超解析度模組:將圖像提升至高解析度輸出。針對全景圖像生成,MVDiffusion可生成達1024×1024像素的高解析度寫實圖像。在幾何條件約束的多視角圖像生成任務中,本方法首次實現了場景網格的紋理貼圖生成。項目頁面請訪問:https://mvdiffusion.github.io。
English
This paper introduces MVDiffusion, a simple yet effective multi-view image
generation method for scenarios where pixel-to-pixel correspondences are
available, such as perspective crops from panorama or multi-view images given
geometry (depth maps and poses). Unlike prior models that rely on iterative
image warping and inpainting, MVDiffusion concurrently generates all images
with a global awareness, encompassing high resolution and rich content,
effectively addressing the error accumulation prevalent in preceding models.
MVDiffusion specifically incorporates a correspondence-aware attention
mechanism, enabling effective cross-view interaction. This mechanism underpins
three pivotal modules: 1) a generation module that produces low-resolution
images while maintaining global correspondence, 2) an interpolation module that
densifies spatial coverage between images, and 3) a super-resolution module
that upscales into high-resolution outputs. In terms of panoramic imagery,
MVDiffusion can generate high-resolution photorealistic images up to
1024times1024 pixels. For geometry-conditioned multi-view image generation,
MVDiffusion demonstrates the first method capable of generating a textured map
of a scene mesh. The project page is at https://mvdiffusion.github.io.