Prox-E: 원시 형상 기반 추상을 통한 세밀한 3D 형상 편집

초록

텍스트 기반 2D 이미지 편집 모델은 최근 놀라운 수준의 성숙도를 달성하며, 이러한 모델에 크게 의존하는 3D 편집 연구가 증가하는 동기를 제공하고 있습니다. 외관 기반 수정에는 효과적이지만, 이러한 2D 중심 3D 편집 파이프라인은 객체의 전체적인 정체성을 엄격히 유지하면서 지역적인 구조 변경을 적용해야 하는 세밀한 3D 편집에는 종종 어려움을 겪습니다. 이러한 한계를 해결하기 위해 우리는 명시적이고 기본 도형 기반의 기하학적 추상화를 통해 세밀한 3D 제어를 가능하게 하는 학습이 필요 없는 프레임워크인 Prox-E를 제안합니다. 우리의 프레임워크는 먼저 입력 3D 형상을 간결한 기하학적 기본 도형 집합으로 추상화합니다. 그런 다음 사전 학습된 비전-언어 모델(VLM)이 이 추상화를 편집하여 기본 도형 수준의 변경 사항을 지정합니다. 이러한 구조적 편집은 이후 3D 생성 모델을 안내하는 데 사용되어 원본 형상의 변경되지 않은 영역을 보존하면서 세밀하고 지역적인 수정을 가능하게 합니다. 광범위한 실험을 통해 우리의 방법이 2D 기반 3D 편집기 및 학습 기반 방법을 포함한 다양한 기존 접근법보다 일관되게 정체성 보존, 형상 품질 및 지시 충실도의 균형을 더 효과적으로 유지함을 입증합니다.

English

Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.

Prox-E: 원시 형상 기반 추상을 통한 세밀한 3D 형상 편집

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

초록

Support