In-2-4D: 単一視点画像2枚から4D生成への中間補間

要旨

我々は、最小限の入力設定から生成的な4D（すなわち3D＋モーション）のインビトウィーニングを行うための新たな問題、In-2-4Dを提案する。具体的には、異なるモーション状態にある物体を捉えた2つの単一視点画像を入力とする。運動の開始状態と終了状態を表す2つの画像が与えられたとき、我々の目標は4D空間におけるモーションを生成し再構築することである。モーションの予測にはビデオ補間モデルを利用するが、フレーム間の大きな動きは曖昧な解釈を招く可能性がある。これを克服するため、階層的アプローチを用いて、入力状態に視覚的に近く、かつ重要な動きを示すキーフレームを特定し、それらの間を滑らかなフラグメントとして生成する。各フラグメントにおいて、Gaussian Splattingを用いてキーフレームの3D表現を構築する。フラグメント内の時間的フレームがモーションを導き、変形フィールドを通じて動的なガウシアンへと変換する。時間的一貫性を向上させ、3Dモーションを洗練させるため、マルチビューディフュージョンの自己注意機構をタイムステップ間で拡張し、剛体変換正則化を適用する。最後に、独立して生成された3Dモーションセグメントを、境界変形フィールドを補間し、ガイドビデオに整合するよう最適化することで統合し、滑らかでちらつきのない遷移を実現する。質的・量的な実験およびユーザスタディを通じて、我々の手法とその構成要素の有効性を示す。プロジェクトページはhttps://in-2-4d.github.io/で公開されている。

English

We propose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening from a minimalistic input setting: two single-view images capturing an object in two distinct motion states. Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D. We utilize a video interpolation model to predict the motion, but large frame-to-frame motions can lead to ambiguous interpretations. To overcome this, we employ a hierarchical approach to identify keyframes that are visually close to the input states and show significant motion, then generate smooth fragments between them. For each fragment, we construct the 3D representation of the keyframe using Gaussian Splatting. The temporal frames within the fragment guide the motion, enabling their transformation into dynamic Gaussians through a deformation field. To improve temporal consistency and refine 3D motion, we expand the self-attention of multi-view diffusion across timesteps and apply rigid transformation regularization. Finally, we merge the independently generated 3D motion segments by interpolating boundary deformation fields and optimizing them to align with the guiding video, ensuring smooth and flicker-free transitions. Through extensive qualitative and quantitiave experiments as well as a user study, we show the effectiveness of our method and its components. The project page is available at https://in-2-4d.github.io/

In-2-4D: 単一視点画像2枚から4D生成への中間補間

In-2-4D: Inbetweening from Two Single-View Images to 4D Generation

要旨

Support