궤적 일관성 증류

초록

잠재 일관성 모델(Latent Consistency Model, LCM)은 일관성 모델을 잠재 공간으로 확장하고, 지도된 일관성 증류 기술을 활용하여 텍스트-이미지 합성의 가속화에서 인상적인 성능을 달성합니다. 그러나 우리는 LCM이 선명함과 세밀한 복잡성을 동시에 갖춘 이미지를 생성하는 데 어려움을 겪는 것을 관찰했습니다. 이러한 한계를 해결하기 위해, 우리는 먼저 근본적인 원인을 탐구하고 명확히 설명합니다. 우리의 조사 결과, 주요 문제는 세 가지 영역에서 발생하는 오류에서 비롯된다는 것을 확인했습니다. 이에 따라, 우리는 궤적 일관성 함수와 전략적 확률적 샘플링을 포함하는 궤적 일관성 증류(Trajectory Consistency Distillation, TCD)를 소개합니다. 궤적 일관성 함수는 자기 일관성 경계 조건의 범위를 확장함으로써 증류 오류를 줄이고, TCD가 확률 흐름 ODE의 전체 궤적을 정확하게 추적할 수 있는 능력을 부여합니다. 또한, 전략적 확률적 샘플링은 다단계 일관성 샘플링에서 내재적으로 발생하는 누적 오류를 피하도록 특별히 설계되었으며, TCD 모델을 보완하기 위해 세심하게 맞춤화되었습니다. 실험 결과, TCD는 낮은 NFE(Number of Function Evaluations)에서 이미지 품질을 크게 향상시킬 뿐만 아니라, 높은 NFE에서도 교사 모델보다 더 세밀한 결과를 생성하는 것으로 나타났습니다.

English

Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed that LCM struggles to generate images with both clarity and detailed intricacy. To address this limitation, we initially delve into and elucidate the underlying causes. Our investigation identifies that the primary issue stems from errors in three distinct areas. Consequently, we introduce Trajectory Consistency Distillation (TCD), which encompasses trajectory consistency function and strategic stochastic sampling. The trajectory consistency function diminishes the distillation errors by broadening the scope of the self-consistency boundary condition and endowing the TCD with the ability to accurately trace the entire trajectory of the Probability Flow ODE. Additionally, strategic stochastic sampling is specifically designed to circumvent the accumulated errors inherent in multi-step consistency sampling, which is meticulously tailored to complement the TCD model. Experiments demonstrate that TCD not only significantly enhances image quality at low NFEs but also yields more detailed results compared to the teacher model at high NFEs.