ChatPaper.aiChatPaper

Sel3DCraft:用戶友好的文本到3D生成之互動視覺提示

Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

August 1, 2025
作者: Nan Xiang, Tianyi Liang, Haiwen Huang, Shiqi Jiang, Hao Huang, Yifei Huang, Liangyu Chen, Changbo Wang, Chenhui Li
cs.AI

摘要

文本到3D(T23D)生成技術已革新了數字內容創作,但仍受制於盲目的試錯提示過程,導致結果難以預測。儘管視覺提示工程在文本到圖像領域已取得進展,但其在3D生成中的應用面臨獨特挑戰,需要多視角一致性評估和空間理解。我們提出了Sel3DCraft,這是一個專為T23D設計的視覺提示工程系統,將無結構的探索轉化為有指導的視覺過程。我們的方法引入了三大創新:結合檢索與生成的雙分支結構,用於多樣化候選探索;多視角混合評分方法,利用多模態大語言模型(MLLMs)及創新高層次指標,以人類專家一致性評估3D模型;以及提示驅動的視覺分析套件,支持直觀的缺陷識別與精細化。廣泛的測試與用戶研究表明,Sel3DCraft在支持設計師創造力方面超越了其他T23D系統。
English
Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.
PDF22August 7, 2025