Sel3DCraft: ユーザーフレンドリーなテキストから3D生成のためのインタラクティブなビジュアルプロンプト

要旨

Text-to-3D（T23D）生成はデジタルコンテンツ制作を変革してきたが、予測不可能な結果をもたらす試行錯誤的なプロンプトプロセスがボトルネックとなっている。テキストから画像への視覚的プロンプトエンジニアリングは進展しているものの、3D生成への応用では、多視点一貫性評価や空間理解といった独自の課題が存在する。本論文では、Sel3DCraftを提案する。これは、T23Dのための視覚的プロンプトエンジニアリングシステムであり、非構造化な探索をガイド付きの視覚的プロセスに変換する。我々のアプローチは、以下の3つの主要な革新を導入する：検索と生成を組み合わせたデュアルブランチ構造による多様な候補探索、MLLMを活用した多視点ハイブリッドスコアリング手法と革新的な高レベルメトリクスを用いた人間の専門家と一致する3Dモデルの評価、そして直感的な欠陥識別と改良を可能にするプロンプト駆動型視覚分析スイートである。広範なテストとユーザー調査により、Sel3DCraftがデザイナーの創造性を支援する点で他のT23Dシステムを凌駕することが示された。

English

Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.

Sel3DCraft: ユーザーフレンドリーなテキストから3D生成のためのインタラクティブなビジュアルプロンプト

Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

要旨

Support