MVDiffusion：实现具有对应感知扩散的全方位多视图图像生成

摘要

本文提出MVDiffusion——一种针对像素级对应关系场景（如全景图像透视裁剪或给定几何信息的多视角图像）的简洁高效多视图生成方法。与依赖迭代图像变形和修复的现有模型不同，MVDiffusion通过全局感知并行生成所有图像，兼具高分辨率和丰富内容，有效解决了传统模型存在的误差累积问题。该方法创新性地引入对应关系感知注意力机制，实现有效的跨视图交互。该机制支撑三个核心模块：1）生成模块，在保持全局对应关系的同时生成低分辨率图像；2）插值模块，对图像间空间覆盖进行稠密化处理；3）超分辨率模块，将图像提升至高分辨率输出。在全景图像生成方面，MVDiffusion可生成高达1024×1024像素的高分辨率逼真图像。在几何条件约束的多视角图像生成任务中，该方法首次实现了场景网格纹理贴图的生成能力。项目页面详见https://mvdiffusion.github.io。

English

This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses). Unlike prior models that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models. MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction. This mechanism underpins three pivotal modules: 1) a generation module that produces low-resolution images while maintaining global correspondence, 2) an interpolation module that densifies spatial coverage between images, and 3) a super-resolution module that upscales into high-resolution outputs. In terms of panoramic imagery, MVDiffusion can generate high-resolution photorealistic images up to 1024times1024 pixels. For geometry-conditioned multi-view image generation, MVDiffusion demonstrates the first method capable of generating a textured map of a scene mesh. The project page is at https://mvdiffusion.github.io.

MVDiffusion：实现具有对应感知扩散的全方位多视图图像生成

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

摘要

Support