生成渲染:具可控性的4D引導視頻生成與2D擴散模型
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
December 3, 2023
作者: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein
cs.AI
摘要
傳統的3D內容創作工具賦予使用者直接控制場景的幾何形狀、外觀、動作和攝影機路徑,讓他們將想像力具現化。然而,創建計算機生成的視頻是一個繁瑣的手動過程,可以通過新興的文本到視頻擴散模型來自動化。儘管視頻擴散模型具有巨大潛力,但很難控制,阻礙了使用者應用自己的創造力,而不是增強它。為了應對這一挑戰,我們提出了一種新穎的方法,將動態3D網格的可控性與新興擴散模型的表現力和可編輯性相結合。為此,我們的方法將動畫化的低保真度渲染網格作為輸入,並將從動態網格獲得的地面真實對應信息注入到預先訓練的文本到圖像生成模型的各個階段,以輸出高質量和時間上一致的幀。我們在各種示例上展示了我們的方法,其中運動可以通過對綁定資產進行動畫製作或更改攝影機路徑來獲得。
English
Traditional 3D content creation tools empower users to bring their
imagination to life by giving them direct control over a scene's geometry,
appearance, motion, and camera path. Creating computer-generated videos,
however, is a tedious manual process, which can be automated by emerging
text-to-video diffusion models. Despite great promise, video diffusion models
are difficult to control, hindering a user to apply their own creativity rather
than amplifying it. To address this challenge, we present a novel approach that
combines the controllability of dynamic 3D meshes with the expressivity and
editability of emerging diffusion models. For this purpose, our approach takes
an animated, low-fidelity rendered mesh as input and injects the ground truth
correspondence information obtained from the dynamic mesh into various stages
of a pre-trained text-to-image generation model to output high-quality and
temporally consistent frames. We demonstrate our approach on various examples
where motion can be obtained by animating rigged assets or changing the camera
path.