칼리브리: 매개변수 효율적 보정을 통한 확산 트랜스포머 성능 향상

초록

본 논문에서는 생성 작업 향상을 위한 Diffusion Transformer(DiT)의 잠재력을 밝혀낸다. 디노이징 과정에 대한 심층 분석을 통해 단일 학습된 스케일링 매개변수 도입이 DiT 블록의 성능을 크게 향상시킬 수 있음을 입증한다. 이러한 통찰을 바탕으로 DiT 구성 요소를 최적으로 보정하여 생성 품질을 높이는 매개변수 효율적 접근법인 Calibri를 제안한다. Calibri는 DiT 보정을 블랙박스 보상 최적화 문제로 구성하며, 진화 알고리즘을 활용해 약 100개의 매개변수만 수정하여 효율적으로 해결한다. 실험 결과, 경량 설계에도 불구하고 Calibri가 다양한 텍스트-이미지 모델에서 일관되게 성능을 향상시키는 것으로 나타났다. 특히 Calibri는 고품질 출력을 유지하면서 이미지 생성에 필요한 추론 단계도 줄이는 효과를 보였다.

English

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this insight, we propose Calibri, a parameter-efficient approach that optimally calibrates DiT components to elevate generative quality. Calibri frames DiT calibration as a black-box reward optimization problem, which is efficiently solved using an evolutionary algorithm and modifies just ~100 parameters. Experimental results reveal that despite its lightweight design, Calibri consistently improves performance across various text-to-image models. Notably, Calibri also reduces the inference steps required for image generation, all while maintaining high-quality outputs.

칼리브리: 매개변수 효율적 보정을 통한 확산 트랜스포머 성능 향상

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

초록

Support