生成渲染：具可控性的4D引導視頻生成與2D擴散模型

摘要

傳統的3D內容創作工具賦予使用者直接控制場景的幾何形狀、外觀、動作和攝影機路徑，讓他們將想像力具現化。然而，創建計算機生成的視頻是一個繁瑣的手動過程，可以通過新興的文本到視頻擴散模型來自動化。儘管視頻擴散模型具有巨大潛力，但很難控制，阻礙了使用者應用自己的創造力，而不是增強它。為了應對這一挑戰，我們提出了一種新穎的方法，將動態3D網格的可控性與新興擴散模型的表現力和可編輯性相結合。為此，我們的方法將動畫化的低保真度渲染網格作為輸入，並將從動態網格獲得的地面真實對應信息注入到預先訓練的文本到圖像生成模型的各個階段，以輸出高質量和時間上一致的幀。我們在各種示例上展示了我們的方法，其中運動可以通過對綁定資產進行動畫製作或更改攝影機路徑來獲得。

English

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.

生成渲染：具可控性的4D引導視頻生成與2D擴散模型

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

摘要

Support