EchoDistill: Bidirectionele Conceptdistillatie voor Persoonlijke Diffusie in één Stap

Samenvatting

Recente vooruitgang in het versnellen van tekst-naar-beeld (T2I) diffusiemodellen heeft de synthese van hoogwaardige afbeeldingen mogelijk gemaakt, zelfs in één enkele stap. Het personaliseren van deze modellen om nieuwe concepten op te nemen blijft echter een uitdaging, vanwege de beperkte capaciteit van éénstapsmodellen om nieuwe conceptdistributies effectief vast te leggen. Wij stellen een bidirectioneel conceptdistillatiekader voor, EchoDistill, om personalisatie in één stap (1-SDP) mogelijk te maken. Onze aanpak omvat een end-to-end trainingsproces waarbij een meerstaps diffusiemodel (leraar) en een éénstaps diffusiemodel (leerling) gelijktijdig worden getraind. Het concept wordt eerst gedistilleerd van het leraarmodel naar het leermodel, en vervolgens teruggekaatst van de leerling naar de leraar. Tijdens EchoDistill delen we de tekstencoder tussen de twee modellen om een consistente semantische interpretatie te waarborgen. Hierna wordt het leermodel geoptimaliseerd met adversariële verliezen om af te stemmen op de distributie van echte afbeeldingen, en met aligneringsverliezen om consistentie met de output van de leraar te behouden. Verder introduceren we de bidirectionele terugkoppelingsverfijningsstrategie, waarbij het leermodel zijn snellere generatievermogen benut om feedback te geven aan het leraarmodel. Dit bidirectionele conceptdistillatiemechanisme verbetert niet alleen het vermogen van de leerling om nieuwe concepten te personaliseren, maar ook de generatieve kwaliteit van het leraarmodel. Onze experimenten tonen aan dat dit collaboratieve kader bestaande personalisatiemethoden in de 1-SDP-opzet significant overtreft, en zo een nieuw paradigma vestigt voor snelle en effectieve personalisatie in T2I-diffusiemodellen.

English

Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During the EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. Following this, the student model is optimized with adversarial losses to align with the real image distribution and with alignment losses to maintain consistency with the teacher's output. Furthermore, we introduce the bidirectional echoing refinement strategy, wherein the student model leverages its faster generation capability to feedback to the teacher model. This bidirectional concept distillation mechanism not only enhances the student ability to personalize novel concepts but also improves the generative quality of the teacher model. Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup, establishing a novel paradigm for rapid and effective personalization in T2I diffusion models.

EchoDistill: Bidirectionele Conceptdistillatie voor Persoonlijke Diffusie in één Stap

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

Samenvatting

Support