ChatPaper.aiChatPaper

超越提示:面向分布外形状的无条件三维反演

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

April 16, 2026
作者: Victoria Yue Chen, Emery Pierson, Léopold Maillard, Maks Ovsjanikov
cs.AI

摘要

生成模型的文本驱动反演是操控2D或3D内容的核心范式,催生了基于文本的编辑、风格迁移或逆问题等诸多应用。然而,该方法依赖于生成模型对自然语言提示保持敏感的前提。我们发现,对于最先进的原生文本到3D生成模型,这一前提往往并不成立。我们识别出一种关键失效模式:生成轨迹会被吸入潜在“沉陷陷阱”——即模型对提示修改变得不敏感的隐空间区域。在此状态下,输入文本的更改无法有效改变内部表征,从而导致输出几何形态保持不变。关键的是,我们观察到这并非模型几何表达能力的局限:同一生成模型本可生成丰富多样的形状,但如实验所示,它们会对分布外文本引导失去响应。通过分析生成模型的采样轨迹,我们发现借助模型的无条件生成先验仍可表征和生成复杂几何形态。由此我们提出了更鲁棒的文本驱动3D形状编辑框架,通过解耦模型的几何表征能力与语言敏感性来规避潜在沉陷。该方法突破了当前3D流程的局限性,实现了对分布外3D形状的高保真语义操控。项目页面:https://daidedou.sorpi.fr/publication/beyondprompts
English
Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's geometric expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution text guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts
PDF41April 18, 2026