EchoDistill: Bidirektionale Konzeptdestillation für Ein-Schritt-Diffusionspersonalisierung

papers.abstract

Jüngste Fortschritte bei der Beschleunigung von Text-zu-Bild (T2I) Diffusionsmodellen ermöglichen die Synthese hochwertiger Bilder sogar in einem einzigen Schritt. Die Personalisierung dieser Modelle zur Integration neuer Konzepte bleibt jedoch eine Herausforderung, da Ein-Schritt-Modelle nur begrenzt in der Lage sind, neue Konzeptverteilungen effektiv zu erfassen. Wir schlagen einen bidirektionalen Konzept-Distillationsrahmen namens EchoDistill vor, um die Ein-Schritt-Diffusionspersonalisierung (1-SDP) zu ermöglichen. Unser Ansatz umfasst einen End-to-End-Trainingsprozess, bei dem ein Mehrschritt-Diffusionsmodell (Lehrermodell) und ein Ein-Schritt-Diffusionsmodell (Schülermodell) gleichzeitig trainiert werden. Das Konzept wird zunächst vom Lehrermodell zum Schülermodell destilliert und dann vom Schüler- zurück zum Lehrermodell "zurückgegeben" (Echo). Während EchoDistill teilen wir den Textencoder zwischen beiden Modellen, um ein konsistentes semantisches Verständnis zu gewährleisten. Anschließend wird das Schülermodell mit adversarialen Verlusten optimiert, um es an die reale Bildverteilung anzupassen, und mit Alignment-Verlusten, um die Konsistenz mit der Ausgabe des Lehrermodells beizubehalten. Darüber hinaus führen wir eine bidirektionale Echo-Verfeinerungsstrategie ein, bei der das Schülermodell seine schnellere Generierungsfähigkeit nutzt, um an das Lehrermodell zurückzumelden. Dieser bidirektionale Konzept-Distillationsmechanismus verbessert nicht nur die Fähigkeit des Schülermodells, neue Konzepte zu personalisieren, sondern steigert auch die generative Qualität des Lehrermodells. Unsere Experimente zeigen, dass dieser kollaborative Rahmen bestehende Personalisierungsmethoden im 1-SDP-Setup signifikant übertrifft und ein neuartiges Paradigma für schnelle und effektive Personalisierung in T2I-Diffusionsmodellen etabliert.

English

Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During the EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. Following this, the student model is optimized with adversarial losses to align with the real image distribution and with alignment losses to maintain consistency with the teacher's output. Furthermore, we introduce the bidirectional echoing refinement strategy, wherein the student model leverages its faster generation capability to feedback to the teacher model. This bidirectional concept distillation mechanism not only enhances the student ability to personalize novel concepts but also improves the generative quality of the teacher model. Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup, establishing a novel paradigm for rapid and effective personalization in T2I diffusion models.

EchoDistill: Bidirektionale Konzeptdestillation für Ein-Schritt-Diffusionspersonalisierung

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

papers.abstract

Support