ChatPaper.aiChatPaper

MVDiffusion:實現具備對應感知擴散模型的整體多視角影像生成

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

July 3, 2023
作者: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
cs.AI

摘要

本文提出MVDiffusion——一種針對具備像素級對應關係場景(如全景圖透視裁切或已知幾何資訊的多視角圖像)的簡潔高效多視角圖像生成方法。有別於依賴迭代式圖像變形與修補的既有模型,MVDiffusion通過全局感知機制並行生成所有圖像,兼具高解析度與豐富內容,有效解決了前人模型中常見的誤差累積問題。該方法特別融入對應感知注意力機制,實現高效的跨視角交互。此機制支撐三個關鍵模組:1) 生成模組:在保持全局對應關係的同時生成低解析度圖像;2) 插值模組:對圖像間空間覆蓋進行稠密化處理;3) 超解析度模組:將圖像提升至高解析度輸出。針對全景圖像生成,MVDiffusion可生成達1024×1024像素的高解析度寫實圖像。在幾何條件約束的多視角圖像生成任務中,本方法首次實現了場景網格的紋理貼圖生成。項目頁面請訪問:https://mvdiffusion.github.io。
English
This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses). Unlike prior models that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models. MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction. This mechanism underpins three pivotal modules: 1) a generation module that produces low-resolution images while maintaining global correspondence, 2) an interpolation module that densifies spatial coverage between images, and 3) a super-resolution module that upscales into high-resolution outputs. In terms of panoramic imagery, MVDiffusion can generate high-resolution photorealistic images up to 1024times1024 pixels. For geometry-conditioned multi-view image generation, MVDiffusion demonstrates the first method capable of generating a textured map of a scene mesh. The project page is at https://mvdiffusion.github.io.
PDF100December 15, 2024