Oltre i Prompt: Inversione 3D Incondizionata per Forme Fuori Distribuzione

Abstract

L'inversione guidata da testo dei modelli generativi è un paradigma fondamentale per manipolare contenuti 2D o 3D, abilitando numerose applicazioni come l'editing basato su testo, il trasferimento di stile o la risoluzione di problemi inversi. Tuttavia, essa si basa sul presupposto che i modelli generativi rimangano sensibili ai prompt in linguaggio naturale. Dimostriamo che, per i moderni modelli generativi nativi text-to-3D all'avanguardia, questo presupposto spesso viene meno. Identifichiamo una modalità di fallimento critica in cui le traiettorie di generazione vengono attratte in "trappole di assorbimento" latenti: regioni dove il modello diventa insensibile alle modifiche del prompt. In questi regimi, le variazioni del testo di input non riescono ad alterare le rappresentazioni interne in modo da modificare la geometria in output. Crucialmente, osserviamo che questa non è una limitazione dell'espressività geometrica del modello; gli stessi modelli generativi possiedono la capacità di produrre un'ampia diversità di forme ma, come dimostriamo, diventano insensibili alla guida testuale fuori distribuzione. Investigiamo questo comportamento analizzando le traiettorie di campionamento del modello generativo e scopriamo che geometrie complesse possono ancora essere rappresentate e prodotte sfruttando il prior generativo incondizionato del modello. Ciò conduce a un framework più robusto per l'editing di forme 3D basato su testo che aggira le trappole latenti disaccoppiando il potere rappresentativo geometrico di un modello dalla sua sensibilità linguistica. Il nostro approccio affronta le limitazioni delle pipeline 3D attuali e abilita la manipolazione semantica ad alta fedeltà di forme 3D fuori distribuzione. Pagina web del progetto: https://daidedou.sorpi.fr/publication/beyondprompts

English

Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's geometric expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution text guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts

Oltre i Prompt: Inversione 3D Incondizionata per Forme Fuori Distribuzione

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

Abstract

Support