Flowception：面向视频生成的时序扩展流匹配技术

摘要

我们提出Flowception——一种新颖的非自回归可变长度视频生成框架。该框架通过学习交织离散帧插入与连续帧去噪的概率路径，实现了高效视频生成。相较于自回归方法，Flowception通过采样过程中的帧插入机制有效压缩长期上下文信息，从而缓解误差累积/漂移问题。相比全序列流方法，我们的训练计算量（FLOPs）降低至三分之一，同时更适配局部注意力变体，并能实现视频时长与内容的联合学习。定量实验表明，本方法在FVD和VBench指标上均优于自回归和全序列基线模型，定性结果进一步验证了其优越性。通过同步学习序列中的帧插入与去噪操作，Flowception可无缝集成图像生成视频、视频插帧等多样化任务。

English

We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with continuous frame denoising. Compared to autoregressive methods, Flowception alleviates error accumulation/drift as the frame insertion mechanism during sampling serves as an efficient compression mechanism to handle long-term context. Compared to full-sequence flows, our method reduces FLOPs for training three-fold, while also being more amenable to local attention variants, and allowing to learn the length of videos jointly with their content. Quantitative experimental results show improved FVD and VBench metrics over autoregressive and full-sequence baselines, which is further validated with qualitative results. Finally, by learning to insert and denoise frames in a sequence, Flowception seamlessly integrates different tasks such as image-to-video generation and video interpolation.

Flowception：面向视频生成的时序扩展流匹配技术

Flowception: Temporally Expansive Flow Matching for Video Generation

摘要

Support