교사 정렬 종단간 증류를 통한 고충실도 2단계 이미지 생성

초록

소단계 확산 증류 기술이 4-8단계 생성을 위한 분야에서 점차 성숙해졌으나, 2단계로 더욱 축소하는 것은 여전히 난제로 남아 있다. 본 연구에서는 8단계 Z-Image Turbo 교사 모델로부터 증류된 고품질 2단계 이미지 생성 모델인 Z-Image Turbo++를 소개한다. 제안하는 방법은 작업 난이도 증가와 제한된 모델 용량이라는 2단계 생성의 핵심 병목 현상을 해결하기 위해, 이 영역에 특화된 세 가지 단순하면서도 효과적인 설계 선택을 도입한다. 첫째, 분포 정렬 적대 학습을 제안하여 외부 실제 이미지 대신 교사 모델이 생성한 이미지를 GAN 학습의 실제 샘플로 사용함으로써, 보다 달성 가능하고 유용한 적대적 목표를 제공한다. 둘째, 단계 분리 매개변수화를 채택하여 두 개의 잡음 제거 단계에 독립적인 모델 매개변수를 할당함으로써 각 단계의 상이한 용량 요구 조건을 더 잘 충족시킨다. 셋째, 반복적 정규화를 통한 종단 간 학습을 수행하여 명시적인 단계-1 손실을 통해 의미 있는 중간 생성 결과를 유지하면서, 첫 번째 단계가 최종 이미지 품질로부터 기울기를 전달받을 수 있도록 한다. 이러한 설계들을 종합적으로 적용함으로써 정성적 및 정량적 평가 모두에서 2단계와 8단계 생성 간의 품질 격차를 현저히 좁혔으며, 이는 소단계 생성에서 품질-효율성 트레이드오프를 개선하기 위해 세심하게 설계된 증류 전략의 잠재력을 강조한다.

English

Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.