生成渲染：具有2D扩散模型的可控4D引导视频生成

摘要

传统的3D内容创建工具赋予用户直接控制场景的几何形状、外观、运动和摄像机路径，从而将他们的想象变为现实。然而，创建计算机生成视频是一个繁琐的手动过程，可以通过新兴的文本到视频扩散模型实现自动化。尽管视频扩散模型具有巨大潜力，但难以控制，阻碍用户施展创造力，而非增强创造力。为解决这一挑战，我们提出了一种新颖方法，将动态3D网格的可控性与新兴扩散模型的表现力和可编辑性相结合。为此，我们的方法以动画、低保真度渲染的网格作为输入，并将从动态网格获得的地面真实对应信息注入预训练的文本到图像生成模型的各个阶段，输出高质量且时间连贯的帧。我们在各种示例上展示了我们的方法，其中运动可以通过对绑定资产进行动画处理或更改摄像机路径来实现。

English

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.

生成渲染：具有2D扩散模型的可控4D引导视频生成

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

摘要

Support