SG-I2V:图像到视频生成中的自导轨迹控制
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
November 7, 2024
作者: Koichi Namekata, Sherwin Bahmani, Ziyi Wu, Yash Kant, Igor Gilitschenski, David B. Lindell
cs.AI
摘要
图像到视频生成的方法已经取得了令人印象深刻的逼真质量。然而,调整生成视频中的特定元素,如物体运动或摄像机移动,通常是一个繁琐的试错过程,例如,涉及使用不同的随机种子重新生成视频。最近的技术解决了这个问题,通过微调预训练模型以遵循条件信号,如边界框或点轨迹。然而,这种微调过程可能计算成本高昂,并且需要带有注释的对象运动的数据集,这可能很难获取。在这项工作中,我们介绍了SG-I2V,这是一个用于可控图像到视频生成的框架,它是自我学习的,只依赖于预训练的图像到视频扩散模型中存在的知识,无需微调或外部知识。我们的零-shot方法在视觉质量和运动保真度方面优于无监督基线,同时在与监督模型的竞争中表现出色。
English
Methods for image-to-video generation have achieved impressive,
photo-realistic quality. However, adjusting specific elements in generated
videos, such as object motion or camera movement, is often a tedious process of
trial and error, e.g., involving re-generating videos with different random
seeds. Recent techniques address this issue by fine-tuning a pre-trained model
to follow conditioning signals, such as bounding boxes or point trajectories.
Yet, this fine-tuning procedure can be computationally expensive, and it
requires datasets with annotated object motion, which can be difficult to
procure. In this work, we introduce SG-I2V, a framework for controllable
image-to-video generation that is self-guidedx2013offering
zero-shot control by relying solely on the knowledge present in a pre-trained
image-to-video diffusion model without the need for fine-tuning or external
knowledge. Our zero-shot method outperforms unsupervised baselines while being
competitive with supervised models in terms of visual quality and motion
fidelity.