ChatPaper.aiChatPaper

終端速度匹配

Terminal Velocity Matching

November 24, 2025
作者: Linqi Zhou, Mathias Parger, Ayaan Haque, Jiaming Song
cs.AI

摘要

我們提出終端速度匹配(TVM),這是一種流匹配的泛化方法,能夠實現高保真度的單步與少步生成建模。TVM 模擬任意兩個擴散時間步之間的轉變,並在其終端時間而非初始時間規範其行為。我們證明當模型滿足 Lipschitz 連續性時,TVM 能為數據分布與模型分布之間的 2-Wasserstein 距離提供上界。然而由於擴散變壓器缺乏此性質,我們引入最小限度的架構調整以實現穩定的單階段訓練。為提升 TVM 的實用效率,我們開發了融合注意力核心,支援對雅可比-向量乘積的反向傳播,此設計能與變壓器架構良好擴展。在 ImageNet-256x256 任務上,TVM 以單次函數評估(NFE)達成 3.29 FID,4 次 NFE 達成 1.99 FID;在 ImageNet-512x512 上則分別實現 4.32(1-NFE)與 2.94(4-NFE)的 FID 成績,代表了從零開始訓練的單步/少步模型中最先進的性能表現。
English
We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the 2-Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves 4.32 1-NFE FID and 2.94 4-NFE FID on ImageNet-512x512, representing state-of-the-art performance for one/few-step models from scratch.
PDF112December 1, 2025