FlowBlending:面向快速高保真视频生成的阶段感知多模型采样方法
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
December 31, 2025
作者: Jibin Song, Mingi Kwon, Jaeseok Jeong, Youngjung Uh
cs.AI
摘要
在本研究中,我们发现模型容量对生成过程的影响随时间步呈现阶段性差异:在早期和晚期阶段至关重要,而在中间阶段基本可以忽略。基于此,我们提出FlowBlending——一种具有阶段感知能力的多模型采样策略,分别在容量敏感阶段和中间阶段采用大模型与小模型协同工作。我们进一步提出了划分阶段边界的简易准则,并通过速度散度分析作为识别容量敏感区域的有效代理指标。在LTX-Video(2B/13B)和WAN 2.1(1.3B/14B)数据集上的实验表明,FlowBlending在保持大模型视觉保真度、时序连贯性和语义对齐能力的同时,可实现最高1.65倍的推理加速,并减少57.35%的浮点运算量。该策略还能与现有采样加速技术兼容,实现额外最高2倍的加速效果。项目页面详见:https://jibin86.github.io/flowblending_project_page。
English
In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly, we propose FlowBlending, a stage-aware multi-model sampling strategy that employs a large model and a small model at capacity-sensitive stages and intermediate stages, respectively. We further introduce simple criteria to choose stage boundaries and provide a velocity-divergence analysis as an effective proxy for identifying capacity-sensitive regions. Across LTX-Video (2B/13B) and WAN 2.1 (1.3B/14B), FlowBlending achieves up to 1.65x faster inference with 57.35% fewer FLOPs, while maintaining the visual fidelity, temporal coherence, and semantic alignment of the large models. FlowBlending is also compatible with existing sampling-acceleration techniques, enabling up to 2x additional speedup. Project page is available at: https://jibin86.github.io/flowblending_project_page.