增強免訓練無限幀生成以生成連貫的長影片
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
May 18, 2026
作者: X. Feng, J. Zhu, M. Wu, C. Chen, F. Mao, H. Guo, J. Wu, X. Chu, K. Huang
cs.AI
摘要
在不显著增加计算开销的前提下,免训练长视频生成旨在使基础视频生成模型能够生成更长的视频。帧级自回归框架(如FIFO-diffusion)具有以恒定内存消耗生成无限长视频的优势。然而,训练与推理之间的不匹配,以及维持长期一致性的挑战,限制了基础模型的有效利用。为解决这些问题,我们提出MIGA,一种新型的无限帧长视频生成方法。首先,我们提出一种有效的两阶段对齐机制,通过减小输入模型的噪声跨度来缓解训练-推理差距。随后,我们引入创新的双重一致性增强机制:自反射方法修正早期高噪声帧,远程帧引导方法利用后期覆盖范围广的低噪声帧来指导生成,共同提升时间一致性。在VBench和NarrLV上的大量实验表明,MIGA达到了最先进的性能。我们的项目页面位于https://xiaokunfeng.github.io/miga_homepage/。
English
Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates the training-inference gap by reducing the excessive noise span fed to the model. We then introduce an innovative dual consistency enhancement mechanism, where the self-reflection approach corrects early high-noise frames and the long-range frame guidance approach leverages later low-noise frames with broad coverage to steer generation, jointly improving temporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.