ChatPaper.aiChatPaper

FlexWorld:逐步擴展的3D場景,實現靈活視角合成

FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis

March 17, 2025
作者: Luxi Chen, Zihan Zhou, Min Zhao, Yikai Wang, Ge Zhang, Wenhao Huang, Hao Sun, Ji-Rong Wen, Chongxuan Li
cs.AI

摘要

從單一圖像生成具有靈活視角的3D場景,包括360度旋轉和縮放,由於缺乏3D數據而具有挑戰性。為此,我們引入了FlexWorld,這是一個由兩個關鍵組件組成的新框架:(1) 一個強大的視頻到視頻(V2V)擴散模型,用於從粗略場景渲染的不完整輸入中生成高質量的新視角圖像;(2) 一個漸進擴展過程,用於構建完整的3D場景。特別是利用先進的預訓練視頻模型和精確的深度估計訓練對,我們的V2V模型能夠在大的相機姿態變化下生成新視角。基於此,FlexWorld通過幾何感知的場景融合,逐步生成新的3D內容並將其整合到全局場景中。大量實驗證明了FlexWorld在從單一圖像生成高質量新視角視頻和靈活視角3D場景方面的有效性,在多個流行指標和數據集上相比現有的最先進方法實現了優越的視覺質量。定性上,我們強調FlexWorld能夠生成具有靈活視角(如360度旋轉和縮放)的高保真場景。項目頁面:https://ml-gsai.github.io/FlexWorld。
English
Generating flexible-view 3D scenes, including 360{\deg} rotation and zooming, from single images is challenging due to a lack of 3D data. To this end, we introduce FlexWorld, a novel framework consisting of two key components: (1) a strong video-to-video (V2V) diffusion model to generate high-quality novel view images from incomplete input rendered from a coarse scene, and (2) a progressive expansion process to construct a complete 3D scene. In particular, leveraging an advanced pre-trained video model and accurate depth-estimated training pairs, our V2V model can generate novel views under large camera pose variations. Building upon it, FlexWorld progressively generates new 3D content and integrates it into the global scene through geometry-aware scene fusion. Extensive experiments demonstrate the effectiveness of FlexWorld in generating high-quality novel view videos and flexible-view 3D scenes from single images, achieving superior visual quality under multiple popular metrics and datasets compared to existing state-of-the-art methods. Qualitatively, we highlight that FlexWorld can generate high-fidelity scenes with flexible views like 360{\deg} rotations and zooming. Project page: https://ml-gsai.github.io/FlexWorld.

Summary

AI-Generated Summary

PDF152March 19, 2025