生成的レンダリング：2D拡散モデルを用いた制御可能な4Dガイド付きビデオ生成

要旨

従来の3Dコンテンツ作成ツールは、シーンのジオメトリ、外観、動き、カメラパスを直接制御することで、ユーザーが自身の想像力を具現化することを可能にします。しかし、コンピュータ生成の動画を作成するのは手間のかかる手動プロセスであり、これは新興のテキストからビデオへの拡散モデルによって自動化することができます。大きな可能性を秘めているにもかかわらず、ビデオ拡散モデルは制御が難しく、ユーザーが自身の創造性を発揮するのではなく、それを阻害してしまうことがあります。この課題に対処するため、我々は動的3Dメッシュの制御性と新興の拡散モデルの表現力および編集性を組み合わせた新しいアプローチを提案します。この目的のために、我々のアプローチでは、アニメーション化された低忠実度レンダリングメッシュを入力として受け取り、動的メッシュから得られたグラウンドトゥルース対応情報を、事前学習済みのテキストから画像生成モデルの各段階に注入して、高品質で時間的に一貫したフレームを出力します。我々は、リグ付きアセットをアニメーション化したり、カメラパスを変更することで動きを得ることができる様々な例でこのアプローチを実証します。

English

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.

生成的レンダリング：2D拡散モデルを用いた制御可能な4Dガイド付きビデオ生成

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

要旨

Support