FlowBlending:面向快速高保真视频生成的阶段感知多模型采样方法
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
December 31, 2025
作者: Jibin Song, Mingi Kwon, Jaeseok Jeong, Youngjung Uh
cs.AI
摘要
本研究揭示了模型容量在不同时间步长中的差异化影响:在初始与终末阶段至关重要,而在中间阶段基本可忽略。基于此,我们提出FlowBlending——一种阶段感知的多模型采样策略,分别在容量敏感阶段和中间阶段采用大模型与小模型。我们进一步提出了划分阶段边界的简易准则,并通过速度散度分析作为识别容量敏感区域的有效代理方法。在LTX-Video(2B/13B)和WAN 2.1(1.3B/14B)数据集上的实验表明,FlowBlending在保持大模型视觉保真度、时序连贯性和语义对齐的前提下,推理速度最高提升1.65倍,FLOPs减少57.35%。该方案还可与现有采样加速技术兼容,实现最高2倍的额外加速。项目页面详见:https://jibin86.github.io/flowblending_project_page。
English
In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly, we propose FlowBlending, a stage-aware multi-model sampling strategy that employs a large model and a small model at capacity-sensitive stages and intermediate stages, respectively. We further introduce simple criteria to choose stage boundaries and provide a velocity-divergence analysis as an effective proxy for identifying capacity-sensitive regions. Across LTX-Video (2B/13B) and WAN 2.1 (1.3B/14B), FlowBlending achieves up to 1.65x faster inference with 57.35% fewer FLOPs, while maintaining the visual fidelity, temporal coherence, and semantic alignment of the large models. FlowBlending is also compatible with existing sampling-acceleration techniques, enabling up to 2x additional speedup. Project page is available at: https://jibin86.github.io/flowblending_project_page.