TexDreamer: Verso la Generazione di Texture 3D Umane ad Alta Fedeltà in Modalità Zero-Shot

Abstract

La texturizzazione di esseri umani 3D con mappe UV semantiche rimane una sfida a causa della difficoltà di acquisire mappe UV ragionevolmente dispiegate. Nonostante i recenti progressi nel campo del testo-a-3D, che supervisionano rendering multi-vista utilizzando grandi modelli di testo-a-immagine (T2I), persistono problemi legati alla velocità di generazione, alla coerenza del testo e alla qualità delle texture, risultando in una scarsità di dati tra i dataset esistenti. Presentiamo TexDreamer, il primo modello di generazione di texture 3D ad alta fedeltà per esseri umani, multimodale e zero-shot. Utilizzando una strategia efficiente di adattamento fine-tuning delle texture, adattiamo un grande modello T2I a una struttura UV semantica preservando la sua capacità di generalizzazione originale. Sfruttando un modulo innovativo di traduzione delle feature, il modello addestrato è in grado di generare texture 3D ad alta fedeltà per esseri umani partendo da testo o immagini in pochi secondi. Inoltre, introduciamo ArTicuLated humAn textureS (ATLAS), il più grande dataset di texture 3D ad alta risoluzione (1024 X 1024) per esseri umani, che contiene 50k texture ad alta fedeltà con descrizioni testuali.

English

Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues persist with generation speed, text consistency, and texture quality, resulting in data scarcity among existing datasets. We present TexDreamer, the first zero-shot multimodal high-fidelity 3D human texture generation model. Utilizing an efficient texture adaptation finetuning strategy, we adapt large T2I model to a semantic UV structure while preserving its original generalization capability. Leveraging a novel feature translator module, the trained model is capable of generating high-fidelity 3D human textures from either text or image within seconds. Furthermore, we introduce ArTicuLated humAn textureS (ATLAS), the largest high-resolution (1024 X 1024) 3D human texture dataset which contains 50k high-fidelity textures with text descriptions.

TexDreamer: Verso la Generazione di Texture 3D Umane ad Alta Fedeltà in Modalità Zero-Shot

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Abstract

Support