生成渲染:具有2D扩散模型的可控4D引导视频生成
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
December 3, 2023
作者: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein
cs.AI
摘要
传统的3D内容创建工具赋予用户直接控制场景的几何形状、外观、运动和摄像机路径,从而将他们的想象变为现实。然而,创建计算机生成视频是一个繁琐的手动过程,可以通过新兴的文本到视频扩散模型实现自动化。尽管视频扩散模型具有巨大潜力,但难以控制,阻碍用户施展创造力,而非增强创造力。为解决这一挑战,我们提出了一种新颖方法,将动态3D网格的可控性与新兴扩散模型的表现力和可编辑性相结合。为此,我们的方法以动画、低保真度渲染的网格作为输入,并将从动态网格获得的地面真实对应信息注入预训练的文本到图像生成模型的各个阶段,输出高质量且时间连贯的帧。我们在各种示例上展示了我们的方法,其中运动可以通过对绑定资产进行动画处理或更改摄像机路径来实现。
English
Traditional 3D content creation tools empower users to bring their
imagination to life by giving them direct control over a scene's geometry,
appearance, motion, and camera path. Creating computer-generated videos,
however, is a tedious manual process, which can be automated by emerging
text-to-video diffusion models. Despite great promise, video diffusion models
are difficult to control, hindering a user to apply their own creativity rather
than amplifying it. To address this challenge, we present a novel approach that
combines the controllability of dynamic 3D meshes with the expressivity and
editability of emerging diffusion models. For this purpose, our approach takes
an animated, low-fidelity rendered mesh as input and injects the ground truth
correspondence information obtained from the dynamic mesh into various stages
of a pre-trained text-to-image generation model to output high-quality and
temporally consistent frames. We demonstrate our approach on various examples
where motion can be obtained by animating rigged assets or changing the camera
path.