X-Dreamer: Erstellung hochwertiger 3D-Inhalte durch Überbrückung der Domänenlücke zwischen Text-zu-2D und Text-zu-3D-Generierung

Zusammenfassung

In jüngster Zeit hat die automatische Text-zu-3D-Inhaltserstellung bedeutende Fortschritte gemacht, angetrieben durch die Entwicklung vortrainierter 2D-Diffusionsmodelle. Bestehende Text-zu-3D-Methoden optimieren typischerweise die 3D-Darstellung, um sicherzustellen, dass das gerenderte Bild gut mit dem gegebenen Text übereinstimmt, wie es durch das vortrainierte 2D-Diffusionsmodell bewertet wird. Dennoch besteht eine erhebliche Domänenlücke zwischen 2D-Bildern und 3D-Assets, die hauptsächlich auf Variationen in kamerabezogenen Attributen und das ausschließliche Vorhandensein von Vordergrundobjekten zurückzuführen ist. Folglich kann die direkte Verwendung von 2D-Diffusionsmodellen zur Optimierung von 3D-Darstellungen zu suboptimalen Ergebnissen führen. Um dieses Problem zu lösen, präsentieren wir X-Dreamer, einen neuartigen Ansatz zur hochwertigen Text-zu-3D-Inhaltserstellung, der die Lücke zwischen Text-zu-2D- und Text-zu-3D-Synthese effektiv überbrückt. Die Schlüsselkomponenten von X-Dreamer sind zwei innovative Designs: Camera-Guided Low-Rank Adaptation (CG-LoRA) und Attention-Mask Alignment (AMA) Loss. CG-LoRA integriert dynamisch Kamerainformationen in die vortrainierten Diffusionsmodelle, indem es kamerabhängige Generierung für trainierbare Parameter verwendet. Diese Integration verbessert die Ausrichtung zwischen den generierten 3D-Assets und der Kameraperspektive. Der AMA-Loss leitet die Aufmerksamkeitskarte des vortrainierten Diffusionsmodells mithilfe der binären Maske des 3D-Objekts an und priorisiert die Erstellung des Vordergrundobjekts. Dieses Modul stellt sicher, dass sich das Modell auf die Generierung präziser und detaillierter Vordergrundobjekte konzentriert. Umfangreiche Auswertungen demonstrieren die Effektivität unseres vorgeschlagenen Ansatzes im Vergleich zu bestehenden Text-zu-3D-Methoden. Unsere Projektwebseite: https://xmuxiaoma666.github.io/Projects/X-Dreamer .

English

In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects. Consequently, employing 2D diffusion models directly for optimizing 3D representations may lead to suboptimal outcomes. To address this issue, we present X-Dreamer, a novel approach for high-quality text-to-3D content creation that effectively bridges the gap between text-to-2D and text-to-3D synthesis. The key components of X-Dreamer are two innovative designs: Camera-Guided Low-Rank Adaptation (CG-LoRA) and Attention-Mask Alignment (AMA) Loss. CG-LoRA dynamically incorporates camera information into the pretrained diffusion models by employing camera-dependent generation for trainable parameters. This integration enhances the alignment between the generated 3D assets and the camera's perspective. AMA loss guides the attention map of the pretrained diffusion model using the binary mask of the 3D object, prioritizing the creation of the foreground object. This module ensures that the model focuses on generating accurate and detailed foreground objects. Extensive evaluations demonstrate the effectiveness of our proposed method compared to existing text-to-3D approaches. Our project webpage: https://xmuxiaoma666.github.io/Projects/X-Dreamer .

X-Dreamer: Erstellung hochwertiger 3D-Inhalte durch Überbrückung der Domänenlücke zwischen Text-zu-2D und Text-zu-3D-Generierung

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

Zusammenfassung

Support