FlowBlending：面向快速高保真视频生成的阶段感知多模型采样方法

摘要

在本研究中，我们发现模型容量对生成过程的影响随时间步呈现阶段性差异：在早期和晚期阶段至关重要，而在中间阶段基本可以忽略。基于此，我们提出FlowBlending——一种具有阶段感知能力的多模型采样策略，分别在容量敏感阶段和中间阶段采用大模型与小模型协同工作。我们进一步提出了划分阶段边界的简易准则，并通过速度散度分析作为识别容量敏感区域的有效代理指标。在LTX-Video（2B/13B）和WAN 2.1（1.3B/14B）数据集上的实验表明，FlowBlending在保持大模型视觉保真度、时序连贯性和语义对齐能力的同时，可实现最高1.65倍的推理加速，并减少57.35%的浮点运算量。该策略还能与现有采样加速技术兼容，实现额外最高2倍的加速效果。项目页面详见：https://jibin86.github.io/flowblending_project_page。

English

In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly, we propose FlowBlending, a stage-aware multi-model sampling strategy that employs a large model and a small model at capacity-sensitive stages and intermediate stages, respectively. We further introduce simple criteria to choose stage boundaries and provide a velocity-divergence analysis as an effective proxy for identifying capacity-sensitive regions. Across LTX-Video (2B/13B) and WAN 2.1 (1.3B/14B), FlowBlending achieves up to 1.65x faster inference with 57.35% fewer FLOPs, while maintaining the visual fidelity, temporal coherence, and semantic alignment of the large models. FlowBlending is also compatible with existing sampling-acceleration techniques, enabling up to 2x additional speedup. Project page is available at: https://jibin86.github.io/flowblending_project_page.

FlowBlending：面向快速高保真视频生成的阶段感知多模型采样方法

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

摘要

Support