時間と空間の探求的インターウィーニング

要旨

境界付き生成を、与えられた開始フレームと終了フレームのみに基づいて任意のカメラおよび被写体の動きを合成するための一般化されたタスクとして導入します。私たちの目的は、元のモデルの追加のトレーニングやファインチューニングなしで、画像から動画へのモデルの持つ本質的な汎化能力を最大限に活用することです。これは、提案する新しいサンプリング戦略である「時間反転融合（Time Reversal Fusion）」によって実現されます。この戦略では、開始フレームと終了フレームにそれぞれ条件付けられた時間的に前方および後方のノイズ除去パスを融合します。融合されたパスは、2つのフレームを滑らかに接続する動画を生成し、忠実な被写体の動きの補間、静的なシーンの新しい視点、および2つの境界フレームが同一の場合のシームレスな動画ループを実現します。多様な画像ペアの評価データセットをキュレーションし、既存の最も近い手法と比較します。その結果、時間反転融合はすべてのサブタスクにおいて関連する研究を上回り、境界付きフレームに導かれた複雑な動きや3D整合性のある視点を生成する能力を示しました。プロジェクトページはhttps://time-reversal.github.ioをご覧ください。

English

We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strategy, which we call Time Reversal Fusion, that fuses the temporally forward and backward denoising paths conditioned on the start and end frame, respectively. The fused path results in a video that smoothly connects the two frames, generating inbetweening of faithful subject motion, novel views of static scenes, and seamless video looping when the two bounding frames are identical. We curate a diverse evaluation dataset of image pairs and compare against the closest existing methods. We find that Time Reversal Fusion outperforms related work on all subtasks, exhibiting the ability to generate complex motions and 3D-consistent views guided by bounded frames. See project page at https://time-reversal.github.io.

時間と空間の探求的インターウィーニング

Explorative Inbetweening of Time and Space

要旨

Support