RIFLEx:視頻擴散Transformer中長度外推的免費午餐
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
February 21, 2025
作者: Min Zhao, Guande He, Yixiao Chen, Hongzhou Zhu, Chongxuan Li, Jun Zhu
cs.AI
摘要
近期在視頻生成領域的進展,已使模型能夠合成高品質、長達一分鐘的視頻。然而,生成更長且具有時間連貫性的視頻仍是一大挑戰,現有的長度外推方法往往導致時間重複或運動減速。在本研究中,我們系統性地分析了位置嵌入中頻率成分的作用,並識別出一個主要控制外推行為的固有頻率。基於這一洞察,我們提出了RIFLEx,這是一種簡潔而有效的方法,它通過降低固有頻率來抑制重複,同時保持運動一致性,且無需任何額外修改。RIFLEx提供了一種真正的“免費午餐”——在最先進的視頻擴散變壓器上,以完全無需訓練的方式實現了高質量的2倍外推。此外,通過極少量的微調,無需長視頻,它還提升了質量並實現了3倍外推。項目頁面及代碼:
https://riflex-video.github.io/
English
Recent advancements in video generation have enabled models to synthesize
high-quality, minute-long videos. However, generating even longer videos with
temporal coherence remains a major challenge, and existing length extrapolation
methods lead to temporal repetition or motion deceleration. In this work, we
systematically analyze the role of frequency components in positional
embeddings and identify an intrinsic frequency that primarily governs
extrapolation behavior. Based on this insight, we propose RIFLEx, a minimal yet
effective approach that reduces the intrinsic frequency to suppress repetition
while preserving motion consistency, without requiring any additional
modifications. RIFLEx offers a true free lunch--achieving high-quality
2times extrapolation on state-of-the-art video diffusion transformers in a
completely training-free manner. Moreover, it enhances quality and enables
3times extrapolation by minimal fine-tuning without long videos. Project
page and codes:
https://riflex-video.github.io/{https://riflex-video.github.io/.}Summary
AI-Generated Summary