FreeTraj：在視頻擴散模型中的無調整軌跡控制

摘要

擴散模型在影片生成方面展現出卓越的能力，進一步引發了將軌跡控制引入生成過程的興趣。儘管現有研究主要集中在基於訓練的方法（例如，條件適配器），我們認為擴散模型本身可以在不需要任何訓練的情況下對生成的內容進行相當控制。在這項研究中，我們引入了一個無需調整的框架，以實現軌跡可控的影片生成，通過對噪聲構建和注意力計算施加引導。具體來說，1）我們首先展示了一些具有指導意義的現象，並分析了初始噪聲如何影響生成內容的運動軌跡。2）隨後，我們提出了FreeTraj，一種無需調整的方法，通過修改噪聲採樣和注意力機制實現軌跡控制。3）此外，我們擴展了FreeTraj，以促進更長更大的影片生成，並實現可控的軌跡。憑藉這些設計，用戶可以靈活地手動提供軌跡，或選擇由LLM軌跡規劃器自動生成的軌跡。大量實驗驗證了我們的方法在增強影片擴散模型軌跡可控性方面的有效性。

English

Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g., conditional adapter), we argue that diffusion model itself allows decent control over the generated content without requiring any training. In this study, we introduce a tuning-free framework to achieve trajectory-controllable video generation, by imposing guidance on both noise construction and attention computation. Specifically, 1) we first show several instructive phenomenons and analyze how initial noises influence the motion trajectory of generated content. 2) Subsequently, we propose FreeTraj, a tuning-free approach that enables trajectory control by modifying noise sampling and attention mechanisms. 3) Furthermore, we extend FreeTraj to facilitate longer and larger video generation with controllable trajectories. Equipped with these designs, users have the flexibility to provide trajectories manually or opt for trajectories automatically generated by the LLM trajectory planner. Extensive experiments validate the efficacy of our approach in enhancing the trajectory controllability of video diffusion models.

FreeTraj：在視頻擴散模型中的無調整軌跡控制

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

摘要

Support