FreeTraj:视频扩散模型中的无调节轨迹控制
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
June 24, 2024
作者: Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu
cs.AI
摘要
扩散模型在视频生成方面展现出了显著的能力,进一步引发了在生成过程中引入轨迹控制的兴趣。虽然现有研究主要集中在基于训练的方法(例如,条件适配器),但我们认为扩散模型本身可以在不需要任何训练的情况下对生成的内容进行相当好的控制。在这项研究中,我们介绍了一个无需调整的框架,通过对噪声构建和注意力计算施加指导,实现了可控轨迹的视频生成。具体来说,1)我们首先展示了几个有启发性的现象,并分析了初始噪声如何影响生成内容的运动轨迹。2)随后,我们提出了FreeTraj,这是一种无需调整的方法,通过修改噪声采样和注意力机制实现轨迹控制。3)此外,我们将FreeTraj扩展到支持更长、更大的视频生成,同时保持可控的轨迹。凭借这些设计,用户可以灵活地手动提供轨迹,或选择由LLM轨迹规划器自动生成的轨迹。大量实验证实了我们的方法在增强视频扩散模型轨迹可控性方面的有效性。
English
Diffusion model has demonstrated remarkable capability in video generation,
which further sparks interest in introducing trajectory control into the
generation process. While existing works mainly focus on training-based methods
(e.g., conditional adapter), we argue that diffusion model itself allows decent
control over the generated content without requiring any training. In this
study, we introduce a tuning-free framework to achieve trajectory-controllable
video generation, by imposing guidance on both noise construction and attention
computation. Specifically, 1) we first show several instructive phenomenons and
analyze how initial noises influence the motion trajectory of generated
content. 2) Subsequently, we propose FreeTraj, a tuning-free approach that
enables trajectory control by modifying noise sampling and attention
mechanisms. 3) Furthermore, we extend FreeTraj to facilitate longer and larger
video generation with controllable trajectories. Equipped with these designs,
users have the flexibility to provide trajectories manually or opt for
trajectories automatically generated by the LLM trajectory planner. Extensive
experiments validate the efficacy of our approach in enhancing the trajectory
controllability of video diffusion models.Summary
AI-Generated Summary