ChatPaper.aiChatPaper

MVDiffusion:通过考虑对应关系的扩散实现全面的多视图图像生成

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

July 3, 2023
作者: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
cs.AI

摘要

本文介绍了MVDiffusion,一种简单而有效的多视图图像生成方法,适用于存在像素对应关系的情况,例如透视裁剪的全景图或给定几何信息(深度图和姿势)的多视图图像。与先前依赖迭代图像扭曲和修补的模型不同,MVDiffusion同时生成所有图像,具有全局意识,包括高分辨率和丰富内容,有效解决了先前模型中普遍存在的误差累积问题。MVDiffusion特别融合了一种对应关系感知注意力机制,实现有效的跨视图交互。该机制支撑三个关键模块:1)生成模块生成低分辨率图像同时保持全局对应关系,2)插值模块增加图像之间的空间覆盖,3)超分辨率模块将图像升级为高分辨率输出。在全景图像方面,MVDiffusion能够生成高达1024x1024像素的高分辨率逼真图像。对于几何条件下的多视图图像生成,MVDiffusion展示了第一个能够生成场景网格纹理地图的方法。项目页面位于https://mvdiffusion.github.io。
English
This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses). Unlike prior models that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models. MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction. This mechanism underpins three pivotal modules: 1) a generation module that produces low-resolution images while maintaining global correspondence, 2) an interpolation module that densifies spatial coverage between images, and 3) a super-resolution module that upscales into high-resolution outputs. In terms of panoramic imagery, MVDiffusion can generate high-resolution photorealistic images up to 1024times1024 pixels. For geometry-conditioned multi-view image generation, MVDiffusion demonstrates the first method capable of generating a textured map of a scene mesh. The project page is at https://mvdiffusion.github.io.
PDF100December 15, 2024