Just-in-Time: Accelerazione Spaziale Senza Addestramento per Trasformatori di Diffusione

Abstract

I Diffusion Transformer hanno stabilito un nuovo stato dell'arte nella sintesi di immagini, ma l'elevato costo computazionale del campionamento iterativo ne ostacola fortemente l'adozione pratica. Sebbene i metodi di accelerazione esistenti si concentrino spesso sul dominio temporale, essi trascurano la sostenziale ridondanza spaziale intrinseca al processo generativo, dove le strutture globali emergono molto prima che i dettagli granulari si formino. Il trattamento computazionale uniforme di tutte le regioni spaziali rappresenta una critica inefficienza. In questo articolo, introduciamo Just-in-Time (JiT), un nuovo framework senza fase di addestramento che affronta questa sfida attraverso un'accelerazione nel dominio spaziale. JiT formula un'equazione differenziale ordinaria (ODE) generativa con approssimazione spaziale che guida l'evoluzione completa dello stato latente basandosi su calcoli provenienti da un sottoinsieme sparso e dinamicamente selezionato di token di ancoraggio. Per garantire transizioni seamless all'incorporamento di nuovi token che espandono le dimensioni dello stato latente, proponiamo un micro-flusso deterministico, una ODE a tempo finito semplice ed efficace che mantiene sia la coerenza strutturale che la correttezza statistica. Esperimenti estesi sul modello all'avanguardia FLUX.1-dev dimostrano che JiT raggiunge un speedup fino a 7x con prestazioni quasi senza perdite, superando significativamente i metodi di accelerazione esistenti e stabilendo un nuovo e superiore compromesso tra velocità di inferenza e fedeltà della generazione.

English

Diffusion Transformers have established a new state-of-the-art in image synthesis, but the high computational cost of iterative sampling severely hampers their practical deployment. While existing acceleration methods often focus on the temporal domain, they overlook the substantial spatial redundancy inherent in the generative process, where global structures emerge long before fine-grained details are formed. The uniform computational treatment of all spatial regions represents a critical inefficiency. In this paper, we introduce Just-in-Time (JiT), a novel training-free framework that addresses this challenge by acceleration in the spatial domain. JiT formulates a spatially approximated generative ordinary differential equation (ODE) that drives the full latent state evolution based on computations from a dynamically selected, sparse subset of anchor tokens. To ensure seamless transitions as new tokens are incorporated to expand the dimensions of the latent state, we propose a deterministic micro-flow, a simple and effective finite-time ODE that maintains both structural coherence and statistical correctness. Extensive experiments on the state-of-the-art FLUX.1-dev model demonstrate that JiT achieves up to a 7x speedup with nearly lossless performance, significantly outperforming existing acceleration methods and establishing a new and superior trade-off between inference speed and generation fidelity.

Just-in-Time: Accelerazione Spaziale Senza Addestramento per Trasformatori di Diffusione

Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

Abstract

Support