MotionCanvas: 制御可能な画像からビデオへのシネマティックショットデザイン

要旨

本論文では、画像から動画を生成する文脈において、ユーザーがシネマティックなビデオショットを設計することを可能にする手法を提案しています。ショットデザインは映画製作の重要な側面であり、シーン内のカメラの動きとオブジェクトの動きを細心の注意を払って計画することを含みます。しかし、現代の画像から動画を生成するシステムにおいて直感的なショットデザインを実現することは、2つの主な課題を提起します。第一に、ユーザーの意図を効果的に捉えること、つまりカメラの動きとシーン内オブジェクトの動きを共同で指定する動きデザインにおいて、第二に、画像アニメーションを合成するためにビデオ拡散モデルによって効果的に利用されることができる動き情報を表現することです。これらの課題に対処するために、MotionCanvasという手法を導入しました。これは、画像から動画（I2V）生成モデルにユーザー主導のコントロールを統合し、シーンを認識した方法でユーザーがシーン内のオブジェクトとカメラの動きを制御できるようにします。古典的なコンピュータグラフィックスと現代のビデオ生成技術からの洞察を結びつけることで、高価な3D関連のトレーニングデータを必要とせずに、I2V合成において3Dを認識した動き制御を実現する能力を示します。MotionCanvasは、ユーザーがシーン内の動きの意図を直感的に描写し、それをビデオ拡散モデルのための時空間動き条件付け信号に変換します。私たちの手法の効果を、幅広い実世界の画像コンテンツとショットデザインシナリオで実証し、デジタルコンテンツ作成の創造的なワークフローを向上させ、さまざまな画像およびビデオ編集アプリケーションに適応させる潜在能力を強調します。

English

This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motions in a scene. However, enabling intuitive shot design in modern image-to-video generation systems presents two main challenges: first, effectively capturing user intentions on the motion design, where both camera movements and scene-space object motions must be specified jointly; and second, representing motion information that can be effectively utilized by a video diffusion model to synthesize the image animations. To address these challenges, we introduce MotionCanvas, a method that integrates user-driven controls into image-to-video (I2V) generation models, allowing users to control both object and camera motions in a scene-aware manner. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis without requiring costly 3D-related training data. MotionCanvas enables users to intuitively depict scene-space motion intentions, and translates them into spatiotemporal motion-conditioning signals for video diffusion models. We demonstrate the effectiveness of our method on a wide range of real-world image content and shot-design scenarios, highlighting its potential to enhance the creative workflows in digital content creation and adapt to various image and video editing applications.

MotionCanvas: 制御可能な画像からビデオへのシネマティックショットデザイン

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

要旨

Support