LATTE3D：大規模攤銷文本增強3D合成

摘要

最近的文本轉3D生成方法產生令人印象深刻的3D結果，但需要耗時的優化，每個提示可能需要長達一小時。像ATT3D這樣的攤銷方法同時優化多個提示，以提高效率，實現快速文本轉3D合成。然而，它們無法捕捉高頻幾何和紋理細節，並且難以擴展到大型提示集，因此泛化能力較差。我們引入LATTE3D，解決這些限制，實現在顯著更大的提示集上快速、高質量的生成。我們方法的關鍵在於：1）構建可擴展的架構，以及2）通過3D感知擴散先驗、形狀正則化和模型初始化在優化過程中利用3D數據，實現對多樣且複雜的訓練提示的韌性。LATTE3D攤銷神經場和紋理表面生成，以在單個前向傳遞中生成高度詳細的紋理網格。LATTE3D在400毫秒內生成3D物體，並可以通過快速測試時間優化進一步增強。

English

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization.

LATTE3D：大規模攤銷文本增強3D合成

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

摘要

Support