ChatPaper.aiChatPaper

VideoScene:一步生成3D場景的視頻擴散模型蒸餾技術

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

April 2, 2025
作者: Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan
cs.AI

摘要

從稀疏視圖中重建3D場景是一項具有挑戰性的任務,因其本質上是一個病態問題。傳統方法已開發出專門的解決方案(如幾何正則化或前饋確定性模型)來緩解這一問題。然而,當輸入視圖之間重疊極少且視覺信息不足時,這些方法仍會出現性能下降。幸運的是,最近的視頻生成模型在應對這一挑戰上展現出潛力,它們能夠生成具有合理3D結構的視頻片段。得益於大規模預訓練的視頻擴散模型,一些先驅研究開始探索視頻生成先驗的潛力,並從稀疏視圖中創建3D場景。儘管取得了顯著的改進,這些方法仍受限於推理速度慢和缺乏3D約束,導致效率低下以及重建結果與現實世界的幾何結構不符。本文中,我們提出了VideoScene,通過蒸餾視頻擴散模型以一步生成3D場景,旨在構建一個高效且有效的工具,彌合從視頻到3D的鴻溝。具體而言,我們設計了一種3D感知的跳躍流蒸餾策略,以跳過耗時的多餘信息,並訓練一個動態去噪策略網絡,在推理過程中自適應地確定最佳跳躍時間步。大量實驗表明,我們的VideoScene相比之前的視頻擴散模型,能夠更快且更優地生成3D場景,凸顯了其作為未來視頻到3D應用高效工具的潛力。項目頁面:https://hanyang-21.github.io/VideoScene
English
Recovering 3D scenes from sparse views is a challenging task due to its inherent ill-posed problem. Conventional methods have developed specialized solutions (e.g., geometry regularization or feed-forward deterministic model) to mitigate the issue. However, they still suffer from performance degradation by minimal overlap across input views with insufficient visual information. Fortunately, recent video generative models show promise in addressing this challenge as they are capable of generating video clips with plausible 3D structures. Powered by large pretrained video diffusion models, some pioneering research start to explore the potential of video generative prior and create 3D scenes from sparse views. Despite impressive improvements, they are limited by slow inference time and the lack of 3D constraint, leading to inefficiencies and reconstruction artifacts that do not align with real-world geometry structure. In this paper, we propose VideoScene to distill the video diffusion model to generate 3D scenes in one step, aiming to build an efficient and effective tool to bridge the gap from video to 3D. Specifically, we design a 3D-aware leap flow distillation strategy to leap over time-consuming redundant information and train a dynamic denoising policy network to adaptively determine the optimal leap timestep during inference. Extensive experiments demonstrate that our VideoScene achieves faster and superior 3D scene generation results than previous video diffusion models, highlighting its potential as an efficient tool for future video to 3D applications. Project Page: https://hanyang-21.github.io/VideoScene

Summary

AI-Generated Summary

PDF402April 3, 2025