SG-I2V:自導軌跡控制在影像到影片生成中
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
November 7, 2024
作者: Koichi Namekata, Sherwin Bahmani, Ziyi Wu, Yash Kant, Igor Gilitschenski, David B. Lindell
cs.AI
摘要
圖像轉視頻生成的方法已經取得了令人印象深刻的照片般逼真的質量。然而,調整生成的視頻中的特定元素,如物體運動或攝像機移動,通常是一個繁瑣的試錯過程,例如,需要重新生成具有不同隨機種子的視頻。最近的技術解決了這個問題,通過微調預先訓練的模型來遵循條件信號,例如邊界框或點軌跡。然而,這種微調過程可能具有計算昂貴的特點,並且需要具有標註的物體運動的數據集,這可能難以獲得。在這項工作中,我們介紹了SG-I2V,一個用於可控圖像轉視頻生成的框架,它是自我學習的,僅依賴於預先訓練的圖像轉視頻擴散模型中存在的知識,無需進行微調或外部知識。我們的零樣本方法在視覺質量和運動保真度方面優於無監督基線,同時在視覺質量和運動保真度方面與監督模型競爭。
English
Methods for image-to-video generation have achieved impressive,
photo-realistic quality. However, adjusting specific elements in generated
videos, such as object motion or camera movement, is often a tedious process of
trial and error, e.g., involving re-generating videos with different random
seeds. Recent techniques address this issue by fine-tuning a pre-trained model
to follow conditioning signals, such as bounding boxes or point trajectories.
Yet, this fine-tuning procedure can be computationally expensive, and it
requires datasets with annotated object motion, which can be difficult to
procure. In this work, we introduce SG-I2V, a framework for controllable
image-to-video generation that is self-guidedx2013offering
zero-shot control by relying solely on the knowledge present in a pre-trained
image-to-video diffusion model without the need for fine-tuning or external
knowledge. Our zero-shot method outperforms unsupervised baselines while being
competitive with supervised models in terms of visual quality and motion
fidelity.