語義分數蒸餾採樣用於組合式文本轉3D生成
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
October 11, 2024
作者: Ling Yang, Zixiang Zhang, Junlin Han, Bohan Zeng, Runjia Li, Philip Torr, Wentao Zhang
cs.AI
摘要
從文字描述生成高質量的3D資產仍然是計算機圖形學和視覺研究中的一個關鍵挑戰。由於3D數據稀缺,最先進的方法利用預先訓練的2D擴散先驗,通過得分蒸餾抽樣(SDS)進行優化。儘管取得了進展,但製作包含多個物體或精細交互作用的複雜3D場景仍然困難。為應對這一挑戰,最近的方法已納入框或佈局指導。然而,這些以佈局為指導的組合方法通常難以提供細粒度控制,因為它們通常是粗糙的並且缺乏表現力。為了克服這些挑戰,我們引入了一種新的SDS方法,稱為語義得分蒸餾抽樣(SemanticSDS),旨在有效提高組合文本到3D生成的表現力和準確性。我們的方法集成了新的語義嵌入,可以在不同的渲染視圖之間保持一致性,並清晰區分各種物體和部件。這些嵌入被轉換為語義地圖,該地圖指導區域特定的SDS過程,實現精確的優化和組合生成。通過利用明確的語義指導,我們的方法發揮了現有預先訓練擴散模型的組合能力,從而在3D內容生成中實現了卓越的質量,特別是對於複雜的物體和場景。實驗結果表明,我們的SemanticSDS框架對於生成最先進的複雜3D內容非常有效。程式碼:https://github.com/YangLing0818/SemanticSDS-3D
English
Generating high-quality 3D assets from textual descriptions remains a pivotal
challenge in computer graphics and vision research. Due to the scarcity of 3D
data, state-of-the-art approaches utilize pre-trained 2D diffusion priors,
optimized through Score Distillation Sampling (SDS). Despite progress, crafting
complex 3D scenes featuring multiple objects or intricate interactions is still
difficult. To tackle this, recent methods have incorporated box or layout
guidance. However, these layout-guided compositional methods often struggle to
provide fine-grained control, as they are generally coarse and lack
expressiveness. To overcome these challenges, we introduce a novel SDS
approach, Semantic Score Distillation Sampling (SemanticSDS), designed to
effectively improve the expressiveness and accuracy of compositional text-to-3D
generation. Our approach integrates new semantic embeddings that maintain
consistency across different rendering views and clearly differentiate between
various objects and parts. These embeddings are transformed into a semantic
map, which directs a region-specific SDS process, enabling precise optimization
and compositional generation. By leveraging explicit semantic guidance, our
method unlocks the compositional capabilities of existing pre-trained diffusion
models, thereby achieving superior quality in 3D content generation,
particularly for complex objects and scenes. Experimental results demonstrate
that our SemanticSDS framework is highly effective for generating
state-of-the-art complex 3D content. Code:
https://github.com/YangLing0818/SemanticSDS-3D