Just-in-Time: 확산 트랜스포머를 위한 학습 불필요 공간 가속

초록

확산 트랜스포머는 이미지 합성 분야에서 새로운 최첨단 기술을确立했지만, 반복적 샘플링의 높은 계산 비용으로 인해 실제 적용이 심각하게 제한되고 있습니다. 기존 가속 방법들은 주로 시간 영역에 집중하는 반면, 생성 과정 내재적인 상당한 공간적 중복성, 즉 미세한 세부 묘사가 형성되기 훨씬 전에 전역 구조가 나타나는 현상을 간과해 왔습니다. 모든 공간 영역을 균일하게 계산하는 방식은 중요한 비효율성을 나타냅니다. 본 논문에서는 공간 영역에서의 가속화를 통해 이 문제를 해결하는 새로운 학습 불필요 프레임워크인 Just-in-Time(JiT)을 소개합니다. JiT는 동적으로 선택된 희소 앵커 토큰들의 계산을 기반으로 전체 잠재 상태의 진화를 이끄는 공간적으로 근사화된 생성 상미분방정식을 공식화합니다. 새로운 토큰이 통합되어 잠재 상태의 차원이 확장될 때 원활한 전환을 보장하기 위해, 우리는 구조적 일관성과 통계적 정확성을 모두 유지하는 간단하면서 효과적인 유한 시간 ODE인 결정론적 마이크로-플로우를 제안합니다. 최첨단 FLUX.1-dev 모델에 대한 광범위한 실험을 통해 JiT가 거의 손실 없는 성능으로 최대 7배의 가속화를 달성하며, 기존 가속 방법들을 크게 능가하고 추론 속도와 생성 정확도 사이에 새로운 그리고 우수한 트레이드오프를确立함을 입증합니다.

English

Diffusion Transformers have established a new state-of-the-art in image synthesis, but the high computational cost of iterative sampling severely hampers their practical deployment. While existing acceleration methods often focus on the temporal domain, they overlook the substantial spatial redundancy inherent in the generative process, where global structures emerge long before fine-grained details are formed. The uniform computational treatment of all spatial regions represents a critical inefficiency. In this paper, we introduce Just-in-Time (JiT), a novel training-free framework that addresses this challenge by acceleration in the spatial domain. JiT formulates a spatially approximated generative ordinary differential equation (ODE) that drives the full latent state evolution based on computations from a dynamically selected, sparse subset of anchor tokens. To ensure seamless transitions as new tokens are incorporated to expand the dimensions of the latent state, we propose a deterministic micro-flow, a simple and effective finite-time ODE that maintains both structural coherence and statistical correctness. Extensive experiments on the state-of-the-art FLUX.1-dev model demonstrate that JiT achieves up to a 7x speedup with nearly lossless performance, significantly outperforming existing acceleration methods and establishing a new and superior trade-off between inference speed and generation fidelity.

Just-in-Time: 확산 트랜스포머를 위한 학습 불필요 공간 가속

Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers

초록

Support