ChatPaper.aiChatPaper

流感知:面向视频生成的时序扩展流匹配方法

Flowception: Temporally Expansive Flow Matching for Video Generation

December 12, 2025
作者: Tariq Berrada Ifriqi, John Nguyen, Karteek Alahari, Jakob Verbeek, Ricky T. Q. Chen
cs.AI

摘要

我们提出Flowception——一种新型非自回归可变长度视频生成框架。该框架通过学习交织离散帧插入与连续帧去噪的概率路径实现视频生成。相较于自回归方法,Flowception通过采样过程中的帧插入机制有效压缩长期上下文信息,从而缓解误差累积/漂移问题。与全序列流方法相比,我们的训练计算量降低三倍,更适配局部注意力变体,并能同步学习视频时长与内容。定量实验表明,该方法在FVD和VBench指标上优于自回归与全序列基线模型,定性结果进一步验证了其优越性。通过联合学习序列中的帧插入与去噪操作,Flowception可无缝集成图像到视频生成、视频插帧等多元任务。
English
We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with continuous frame denoising. Compared to autoregressive methods, Flowception alleviates error accumulation/drift as the frame insertion mechanism during sampling serves as an efficient compression mechanism to handle long-term context. Compared to full-sequence flows, our method reduces FLOPs for training three-fold, while also being more amenable to local attention variants, and allowing to learn the length of videos jointly with their content. Quantitative experimental results show improved FVD and VBench metrics over autoregressive and full-sequence baselines, which is further validated with qualitative results. Finally, by learning to insert and denoise frames in a sequence, Flowception seamlessly integrates different tasks such as image-to-video generation and video interpolation.
PDF32December 17, 2025