소수 스텝 확산 증류를 위한 연속시간 분포 매칭

초록

스텝 증류는 확산 모델 가속화를 위한 주요 기법으로 자리 잡았으며, 그 중 분포 매칭 증류(DMD)와 일관성 증류가 대표적인 패러다임이다. 일관성 기반 방법은 전체 PF-ODE 궤적을 따라 자기 일관성을 강화하여 깨끗한 데이터 매니폴드로 유도하는 반면, 기본 DMD는 소수의 미리 정의된 이산 시간 단계에서 희소한 지도 학습에 의존한다. 이러한 제한된 이산 시간 구성과 역 KL 발산의 모드 추적 특성은 시각적 아티팩트와 과도하게 평활화된 결과를 보이는 경향이 있어, 시각적 충실도를 회복하기 위해 GAN이나 보상 모델과 같은 복잡한 보조 모듈을 필요로 하는 경우가 많다. 본 연구에서는 DMD 프레임워크를 이산 고정점 방식에서 최초로 연속 최적화로 전환하는 연속 시간 분포 매칭(CDM)을 제안한다. CDM은 두 가지 연속 시간 설계를 통해 이를 실현한다. 첫째, 고정된 이산 스케줄을 임의 길이의 동적 연속 스케줄로 대체하여 소수의 고정된 앵커 지점이 아닌 샘플링 궤적 상의 임의 지점에서 분포 매칭이 수행되도록 한다. 둘째, 학생 모델의 속도장으로 외삽된 잠재 변수에 대해 능동적인 궤적 외부 매칭을 수행하는 연속 시간 정렬 목적함수를 제안하여 일반화 성능을 향상시키고 미세한 시각적 디테일을 보존한다. SD3-Medium 및 Longcat-Image를 포함한 다양한 아키텍처에서의 광범위한 실험을 통해 CDM이 복잡한 보조 목적함수 없이도 소수 스텝 이미지 생성에서 매우 경쟁력 있는 시각적 충실도를 제공함을 입증한다. 코드는 https://github.com/byliutao/cdm에서 확인할 수 있다.

English

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL divergence tends to exhibit visual artifacts and over-smoothed outputs, often necessitating complex auxiliary modules -- such as GANs or reward models -- to restore visual fidelity. In this work, we introduce Continuous-Time Distribution Matching (CDM), migrating the DMD framework from discrete anchoring to continuous optimization for the first time. CDM achieves this through two continuous-time designs. First, we replace the fixed discrete schedule with a dynamic continuous schedule of random length, so that distribution matching is enforced at arbitrary points along sampling trajectories rather than only at a few fixed anchors. Second, we propose a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student's velocity field, improving generalization and preserving fine visual details. Extensive experiments on different architectures, including SD3-Medium and Longcat-Image, demonstrate that CDM provides highly competitive visual fidelity for few-step image generation without relying on complex auxiliary objectives. Code is available at https://github.com/byliutao/cdm.

소수 스텝 확산 증류를 위한 연속시간 분포 매칭

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

초록

Support