構成テキストから3D生成への意味スコア蒸留サンプリング

要旨

テキスト記述から高品質な3Dアセットを生成することは、コンピュータグラフィックスとビジョン研究において重要な課題です。3Dデータの希少性から、最先端のアプローチでは、Score Distillation Sampling（SDS）を介して最適化された事前学習済み2D拡散事前分布が利用されています。進歩はあるものの、複数のオブジェクトや入り組んだ相互作用を備えた複雑な3Dシーンを作成することは依然として難しいです。この課題に取り組むため、最近の手法ではボックスやレイアウトのガイダンスを組み込んでいます。しかし、これらのレイアウトガイド付きの構成手法は、一般的に粗く表現力に欠けるため、細かい制御を提供するのが難しいことがよくあります。これらの課題に対処するために、我々は新しいSDSアプローチ、Semantic Score Distillation Sampling（SemanticSDS）を導入しました。この手法は、構成テキストから3D生成の表現力と精度を効果的に向上させるよう設計されています。我々の手法は、異なるレンダリングビュー間で一貫性を維持し、さまざまなオブジェクトや部品を明確に区別する新しい意味的埋め込みを統合しています。これらの埋め込みは意味マップに変換され、領域固有のSDSプロセスを指示し、精密な最適化と構成生成を可能にします。明示的な意味的ガイダンスを活用することで、我々の手法は既存の事前学習済み拡散モデルの構成能力を引き出し、特に複雑なオブジェクトやシーンにおいて、3Dコンテンツ生成の品質を向上させます。実験結果は、我々のSemanticSDSフレームワークが最先端の複雑な3Dコンテンツを生成するのに非常に効果的であることを示しています。コード：https://github.com/YangLing0818/SemanticSDS-3D

English

Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content. Code: https://github.com/YangLing0818/SemanticSDS-3D

構成テキストから3D生成への意味スコア蒸留サンプリング

Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

要旨

Support