ChatPaper.aiChatPaper

SG-I2V:自導軌跡控制在影像到影片生成中

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

November 7, 2024
作者: Koichi Namekata, Sherwin Bahmani, Ziyi Wu, Yash Kant, Igor Gilitschenski, David B. Lindell
cs.AI

摘要

圖像轉視頻生成的方法已經取得了令人印象深刻的照片般逼真的質量。然而,調整生成的視頻中的特定元素,如物體運動或攝像機移動,通常是一個繁瑣的試錯過程,例如,需要重新生成具有不同隨機種子的視頻。最近的技術解決了這個問題,通過微調預先訓練的模型來遵循條件信號,例如邊界框或點軌跡。然而,這種微調過程可能具有計算昂貴的特點,並且需要具有標註的物體運動的數據集,這可能難以獲得。在這項工作中,我們介紹了SG-I2V,一個用於可控圖像轉視頻生成的框架,它是自我學習的,僅依賴於預先訓練的圖像轉視頻擴散模型中存在的知識,無需進行微調或外部知識。我們的零樣本方法在視覺質量和運動保真度方面優於無監督基線,同時在視覺質量和運動保真度方面與監督模型競爭。
English
Methods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guidedx2013offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while being competitive with supervised models in terms of visual quality and motion fidelity.
PDF154December 4, 2025