Prox-E：基于几何基元的细粒度三维形状编辑

摘要

基于文本的2D图像编辑模型近期已臻成熟，这推动了越来越多依赖此类模型实现3D编辑的研究工作。虽然这类以2D为核心的3D编辑流程在基于外观的修改方面表现优异，但在细粒度3D编辑任务中往往力有不逮——这类任务需要在严格保持物体整体身份特征的同时实施局部结构变更。为突破此局限，我们提出Prox-E框架：一种无需训练、通过显式基元化几何抽象实现细粒度3D控制的解决方案。该框架首先将输入3D形状抽象为紧凑的几何基元集合，随后借助预训练视觉语言模型（VLM）对该抽象表示进行基元层级的编辑标注。这些结构编辑信息将引导3D生成模型，在保持原始形状未修改区域的同时实现局部精细化调整。通过大量实验验证，本方法在身份特征保持、形状质量与指令遵循度三个维度上，均优于现有基于2D的3D编辑器和训练依赖型方法，展现出更稳定的综合性能。

English

Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.

Prox-E：基于几何基元的细粒度三维形状编辑

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

摘要

Support