Sel3DCraft:面向用户友好的文本到3D生成的交互式视觉提示
Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation
August 1, 2025
作者: Nan Xiang, Tianyi Liang, Haiwen Huang, Shiqi Jiang, Hao Huang, Yifei Huang, Liangyu Chen, Changbo Wang, Chenhui Li
cs.AI
摘要
文本到3D(T23D)生成技术已革新了数字内容创作领域,但仍受制于盲目试错的提示过程,导致结果难以预测。尽管视觉提示工程在文本到图像领域取得了进展,但其在3D生成中的应用面临独特挑战,需进行多视角一致性评估与空间理解。我们推出了Sel3DCraft,这是一套专为T23D设计的视觉提示工程系统,将无序探索转化为有引导的视觉流程。我们的方法引入了三大创新:结合检索与生成的双分支结构,以探索多样候选方案;采用多视角混合评分方法,利用多模态大语言模型(MLLMs)及创新性高层次指标,以人类专家一致性评估3D模型;以及一套提示驱动的视觉分析工具集,支持直观缺陷识别与优化。广泛的测试与用户研究表明,Sel3DCraft在支持设计师创造力方面超越了其他T23D系统。
English
Text-to-3D (T23D) generation has transformed digital content creation, yet
remains bottlenecked by blind trial-and-error prompting processes that yield
unpredictable results. While visual prompt engineering has advanced in
text-to-image domains, its application to 3D generation presents unique
challenges requiring multi-view consistency evaluation and spatial
understanding. We present Sel3DCraft, a visual prompt engineering system for
T23D that transforms unstructured exploration into a guided visual process. Our
approach introduces three key innovations: a dual-branch structure combining
retrieval and generation for diverse candidate exploration; a multi-view hybrid
scoring approach that leverages MLLMs with innovative high-level metrics to
assess 3D models with human-expert consistency; and a prompt-driven visual
analytics suite that enables intuitive defect identification and refinement.
Extensive testing and user studies demonstrate that Sel3DCraft surpasses other
T23D systems in supporting creativity for designers.