ArcFlow:通过高精度非线性流蒸馏实现两步式文本到图像生成的突破
ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation
February 9, 2026
作者: Zihan Yang, Shuyuan Tu, Licheng Zhang, Qi Dai, Yu-Gang Jiang, Zuxuan Wu
cs.AI
摘要
扩散模型虽已实现卓越的生成质量,但其依赖多步序列去噪的特性导致推理成本高昂,这推动了近期将推理过程蒸馏为少步范式的研究。然而,现有蒸馏方法通常采用线性捷径来近似教师轨迹,难以匹配速度场随时间步不断变化的切线方向,从而导致质量下降。为解决此局限,我们提出ArcFlow——一种通过显式使用非线性流轨迹来逼近预训练教师轨迹的少步蒸馏框架。具体而言,ArcFlow将推理轨迹背后的速度场参数化为连续动量过程的混合,从而能够捕捉速度演化规律并外推出连贯的速度场,在每一步去噪过程中形成连续非线性轨迹。关键的是,这种参数化允许对非线性轨迹进行解析积分,既规避了数值离散化误差,又能实现教师轨迹的高精度逼近。为将该参数化训练为少步生成器,我们通过轻量适配器在预训练教师模型上实施轨迹蒸馏。该策略在保持生成多样性与质量的同时,确保了快速稳定的收敛。基于大规模模型(Qwen-Image-20B和FLUX.1-dev)的实验表明,ArcFlow仅需微调不足5%的原始参数,在以2次NFEs实现40倍加速的同时,相较原始多步教师模型未出现显著质量下降。基准测试从定性与定量两方面验证了ArcFlow的有效性。
English
Diffusion models have achieved remarkable generation quality, but they suffer from significant inference cost due to their reliance on multiple sequential denoising steps, motivating recent efforts to distill this inference process into a few-step regime. However, existing distillation methods typically approximate the teacher trajectory by using linear shortcuts, which makes it difficult to match its constantly changing tangent directions as velocities evolve across timesteps, thereby leading to quality degradation. To address this limitation, we propose ArcFlow, a few-step distillation framework that explicitly employs non-linear flow trajectories to approximate pre-trained teacher trajectories. Concretely, ArcFlow parameterizes the velocity field underlying the inference trajectory as a mixture of continuous momentum processes. This enables ArcFlow to capture velocity evolution and extrapolate coherent velocities to form a continuous non-linear trajectory within each denoising step. Importantly, this parameterization admits an analytical integration of this non-linear trajectory, which circumvents numerical discretization errors and results in high-precision approximation of the teacher trajectory. To train this parameterization into a few-step generator, we implement ArcFlow via trajectory distillation on pre-trained teacher models using lightweight adapters. This strategy ensures fast, stable convergence while preserving generative diversity and quality. Built on large-scale models (Qwen-Image-20B and FLUX.1-dev), ArcFlow only fine-tunes on less than 5% of original parameters and achieves a 40x speedup with 2 NFEs over the original multi-step teachers without significant quality degradation. Experiments on benchmarks show the effectiveness of ArcFlow both qualitatively and quantitatively.