AnyFlow: Any-Step videodiffusiemodel met on-policy stroomkaartdestillatie

Samenvatting

Video generatie met weinig stappen is aanzienlijk verbeterd door consistentie destillatie. Echter, de prestaties van consistentie-gedestilleerde modellen gaan vaak achteruit wanneer meer bemonsteringsstappen worden toegewezen tijdens het testen, wat hun effectiviteit voor elke-stap video diffusie beperkt. Deze beperking ontstaat omdat consistentie destillatie het oorspronkelijke waarschijnlijkheidsstroom ODE-traject vervangt door een consistentie-bemonsterings traject, waardoor het wenselijke testtijd schaalgedrag van ODE-bemonstering wordt verzwakt. Om deze beperking aan te pakken, introduceren we AnyFlow, het eerste elke-stap video diffusie destillatieraamwerk gebaseerd op stroomkaarten. In plaats van een model te destilleren voor slechts een paar vaste bemonsteringsstappen, optimaliseert AnyFlow het volledige ODE-bemonsteringstraject. Hiertoe verschuiven we het destillatiedoel van eindpunt consistentie mapping (z_{t}rightarrow z_{0}) naar stroomkaart overgangsleren (z_{t}rightarrow z_{r}) over willekeurige tijdsintervallen. We stellen verder Flow Map Backward Simulation voor, die een volledige Euler-rollout ontleedt in snelkoppeling stroomkaart-overgangen, waardoor efficiënte on-policy destillatie mogelijk wordt die testtijdfouten vermindert (d.w.z. discretisatiefout in bemonstering met weinig stappen en blootstellingsbias in causale generatie). Uitgebreide experimenten over zowel bidirectionele als causale architecturen, op schalen variërend van 1,3B tot 14B parameters, tonen aan dat AnyFlow prestaties behaalt die gelijk zijn aan of beter zijn dan consistentie-gebaseerde tegenhangers in het regime met weinig stappen, terwijl het schaalt met bemonsteringsstapbudgetten.

English

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based on flow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the full ODE sampling trajectory. To this end, we shift the distillation target from endpoint consistency mapping (z_{t}rightarrow z_{0}) to flow-map transition learning (z_{t}rightarrow z_{r}) over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors (i.e., discretization error in few-step sampling and exposure bias in causal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.