FlexWorld:面向灵活视角合成的渐进式扩展三维场景
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis
March 17, 2025
作者: Luxi Chen, Zihan Zhou, Min Zhao, Yikai Wang, Ge Zhang, Wenhao Huang, Hao Sun, Ji-Rong Wen, Chongxuan Li
cs.AI
摘要
从单张图像生成包含360度旋转和缩放功能的灵活视角3D场景,由于缺乏3D数据而颇具挑战性。为此,我们提出了FlexWorld,一个创新框架,包含两大核心组件:(1) 一个强大的视频到视频(V2V)扩散模型,用于从粗略场景渲染的不完整输入中生成高质量的新视角图像;(2) 一个渐进式扩展过程,用于构建完整的3D场景。特别地,借助先进的预训练视频模型和精确的深度估计训练对,我们的V2V模型能够在相机姿态大幅变化的情况下生成新视角。在此基础上,FlexWorld通过几何感知的场景融合,逐步生成新的3D内容并将其整合到全局场景中。大量实验证明,FlexWorld在从单张图像生成高质量新视角视频和灵活视角3D场景方面效果显著,在多个流行指标和数据集上相比现有最先进方法实现了更优的视觉质量。定性分析中,我们强调FlexWorld能够生成具有高保真度的场景,支持如360度旋转和缩放等灵活视角。项目页面:https://ml-gsai.github.io/FlexWorld。
English
Generating flexible-view 3D scenes, including 360{\deg} rotation and zooming,
from single images is challenging due to a lack of 3D data. To this end, we
introduce FlexWorld, a novel framework consisting of two key components: (1) a
strong video-to-video (V2V) diffusion model to generate high-quality novel view
images from incomplete input rendered from a coarse scene, and (2) a
progressive expansion process to construct a complete 3D scene. In particular,
leveraging an advanced pre-trained video model and accurate depth-estimated
training pairs, our V2V model can generate novel views under large camera pose
variations. Building upon it, FlexWorld progressively generates new 3D content
and integrates it into the global scene through geometry-aware scene fusion.
Extensive experiments demonstrate the effectiveness of FlexWorld in generating
high-quality novel view videos and flexible-view 3D scenes from single images,
achieving superior visual quality under multiple popular metrics and datasets
compared to existing state-of-the-art methods. Qualitatively, we highlight that
FlexWorld can generate high-fidelity scenes with flexible views like 360{\deg}
rotations and zooming. Project page: https://ml-gsai.github.io/FlexWorld.Summary
AI-Generated Summary