MVDiffusion:啟用具有對應感知擴散的整體多視圖影像生成
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
July 3, 2023
作者: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
cs.AI
摘要
本文介紹了MVDiffusion,一種簡單而有效的多視圖圖像生成方法,適用於具有像素對應的情況,例如從全景圖或多視圖圖像中進行透視裁剪,並提供幾何信息(深度圖和姿態)。與依賴迭代圖像變形和修補的先前模型不同,MVDiffusion同時生成所有圖像,具有全局意識,包括高分辨率和豐富內容,有效解決了先前模型中普遍存在的錯誤累積問題。MVDiffusion特別融入了一種考慮對應關係的注意機制,實現有效的跨視圖交互。該機制支撐著三個關鍵模塊:1)生成模塊,生成低分辨率圖像的同時保持全局對應,2)插值模塊,增加圖像之間的空間覆蓋,3)超分辨率模塊,將圖像升級為高分辨率輸出。在全景圖像方面,MVDiffusion能夠生成高達1024x1024像素的高分辨率逼真圖像。對於基於幾何條件的多視圖圖像生成,MVDiffusion展示了首個能夠生成場景網格的紋理地圖的方法。項目頁面位於https://mvdiffusion.github.io。
English
This paper introduces MVDiffusion, a simple yet effective multi-view image
generation method for scenarios where pixel-to-pixel correspondences are
available, such as perspective crops from panorama or multi-view images given
geometry (depth maps and poses). Unlike prior models that rely on iterative
image warping and inpainting, MVDiffusion concurrently generates all images
with a global awareness, encompassing high resolution and rich content,
effectively addressing the error accumulation prevalent in preceding models.
MVDiffusion specifically incorporates a correspondence-aware attention
mechanism, enabling effective cross-view interaction. This mechanism underpins
three pivotal modules: 1) a generation module that produces low-resolution
images while maintaining global correspondence, 2) an interpolation module that
densifies spatial coverage between images, and 3) a super-resolution module
that upscales into high-resolution outputs. In terms of panoramic imagery,
MVDiffusion can generate high-resolution photorealistic images up to
1024times1024 pixels. For geometry-conditioned multi-view image generation,
MVDiffusion demonstrates the first method capable of generating a textured map
of a scene mesh. The project page is at https://mvdiffusion.github.io.