ChatPaper.aiChatPaper

Sel3DCraft:面向用户友好的文本到3D生成的交互式视觉提示

Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

August 1, 2025
作者: Nan Xiang, Tianyi Liang, Haiwen Huang, Shiqi Jiang, Hao Huang, Yifei Huang, Liangyu Chen, Changbo Wang, Chenhui Li
cs.AI

摘要

文本到3D(T23D)生成技术已革新了数字内容创作领域,但仍受制于盲目试错的提示过程,导致结果难以预测。尽管视觉提示工程在文本到图像领域取得了进展,但其在3D生成中的应用面临独特挑战,需进行多视角一致性评估与空间理解。我们推出了Sel3DCraft,这是一套专为T23D设计的视觉提示工程系统,将无序探索转化为有引导的视觉流程。我们的方法引入了三大创新:结合检索与生成的双分支结构,以探索多样候选方案;采用多视角混合评分方法,利用多模态大语言模型(MLLMs)及创新性高层次指标,以人类专家一致性评估3D模型;以及一套提示驱动的视觉分析工具集,支持直观缺陷识别与优化。广泛的测试与用户研究表明,Sel3DCraft在支持设计师创造力方面超越了其他T23D系统。
English
Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.
PDF22August 7, 2025