Sel3DCraft:用戶友好的文本到3D生成之互動視覺提示
Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation
August 1, 2025
作者: Nan Xiang, Tianyi Liang, Haiwen Huang, Shiqi Jiang, Hao Huang, Yifei Huang, Liangyu Chen, Changbo Wang, Chenhui Li
cs.AI
摘要
文本到3D(T23D)生成技術已革新了數字內容創作,但仍受制於盲目的試錯提示過程,導致結果難以預測。儘管視覺提示工程在文本到圖像領域已取得進展,但其在3D生成中的應用面臨獨特挑戰,需要多視角一致性評估和空間理解。我們提出了Sel3DCraft,這是一個專為T23D設計的視覺提示工程系統,將無結構的探索轉化為有指導的視覺過程。我們的方法引入了三大創新:結合檢索與生成的雙分支結構,用於多樣化候選探索;多視角混合評分方法,利用多模態大語言模型(MLLMs)及創新高層次指標,以人類專家一致性評估3D模型;以及提示驅動的視覺分析套件,支持直觀的缺陷識別與精細化。廣泛的測試與用戶研究表明,Sel3DCraft在支持設計師創造力方面超越了其他T23D系統。
English
Text-to-3D (T23D) generation has transformed digital content creation, yet
remains bottlenecked by blind trial-and-error prompting processes that yield
unpredictable results. While visual prompt engineering has advanced in
text-to-image domains, its application to 3D generation presents unique
challenges requiring multi-view consistency evaluation and spatial
understanding. We present Sel3DCraft, a visual prompt engineering system for
T23D that transforms unstructured exploration into a guided visual process. Our
approach introduces three key innovations: a dual-branch structure combining
retrieval and generation for diverse candidate exploration; a multi-view hybrid
scoring approach that leverages MLLMs with innovative high-level metrics to
assess 3D models with human-expert consistency; and a prompt-driven visual
analytics suite that enables intuitive defect identification and refinement.
Extensive testing and user studies demonstrate that Sel3DCraft surpasses other
T23D systems in supporting creativity for designers.