Prox-E: プリミティブベースの抽象化による微細な3D形状編集

要旨

テキストベースの2D画像編集モデルは近年目覚ましい成熟度を達成し、これらのモデルを駆動力として3D編集を実現する研究が増加している。外観ベースの修正には効果的であるものの、このような2D中心の3D編集パイプラインは、対象物の全体的な同一性を厳密に保持しつつ局所的な構造変更を適用する必要がある微細な3D編集において困難に直面することが多い。この課題を解決するため、我々は明示的なプリミティブベースの幾何学的抽象化を通じて微細な3D制御を可能にする学習不要のフレームワーク「Prox-E」を提案する。本フレームワークはまず入力3D形状をコンパクトな幾何学的プリミティブの集合に抽象化する。事前学習済みの視覚言語モデル（VLM）がこの抽象化表現を編集し、プリミティブレベルの変更を指定する。これらの構造的編集は subsequently 3D生成モデルを導くために使用され、元の形状の変更されない領域を保持しつつ、局所的な微細な修正を実現する。大規模な実験を通じて、本手法が2Dベースの3D編集手法や学習ベースの手法を含む既存の多様なアプローチよりも、同一性保持、形状品質、指示忠実性のバランスを一貫して効果的に達成することを実証する。

English

Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.

Prox-E: プリミティブベースの抽象化による微細な3D形状編集

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

要旨

Support