スケーラブルで一貫性のある3D編集に向けて

要旨

3D編集 - 3Dアセットの形状や外観を局所的に変更するタスク - は、没入型コンテンツ制作、デジタルエンターテインメント、AR/VRなど幅広い応用が可能です。しかし、2D編集とは異なり、視点間の一貫性、構造の忠実性、細粒度の制御性を保つ必要があるため、依然として課題が残っています。既存のアプローチは、処理速度が遅い、幾何学的な歪みが生じやすい、あるいはエラーが発生しやすく実用的でない正確な3Dマスクに依存していることが多いです。これらの課題に対処するため、我々はデータとモデルの両面で進展を遂げました。データ面では、これまでで最大のペア型3D編集ベンチマークである3DEditVerseを導入しました。これは116,309の高品質なトレーニングペアと1,500の精選されたテストペアで構成されています。ポーズ駆動の幾何学的編集と基盤モデルガイドの外観編集という補完的なパイプラインを通じて構築された3DEditVerseは、編集の局所性、多視点一貫性、意味的整合性を保証します。モデル面では、3D構造を保持する条件付きトランスフォーマーである3DEditFormerを提案します。デュアルガイダンスアテンションと時間適応型ゲーティングを組み込むことで、3DEditFormerは編集可能な領域を保持された構造から分離し、補助的な3Dマスクを必要とせずに正確で一貫性のある編集を可能にします。大規模な実験により、我々のフレームワークが定量的・定性的に最先端のベースラインを上回り、実用的でスケーラブルな3D編集の新たな標準を確立することが示されました。データセットとコードは公開予定です。プロジェクト: https://www.lv-lab.org/3DEditFormer/

English

3D editing - the task of locally modifying the geometry or appearance of a 3D asset - has wide applications in immersive content creation, digital entertainment, and AR/VR. However, unlike 2D editing, it remains challenging due to the need for cross-view consistency, structural fidelity, and fine-grained controllability. Existing approaches are often slow, prone to geometric distortions, or dependent on manual and accurate 3D masks that are error-prone and impractical. To address these challenges, we advance both the data and model fronts. On the data side, we introduce 3DEditVerse, the largest paired 3D editing benchmark to date, comprising 116,309 high-quality training pairs and 1,500 curated test pairs. Built through complementary pipelines of pose-driven geometric edits and foundation model-guided appearance edits, 3DEditVerse ensures edit locality, multi-view consistency, and semantic alignment. On the model side, we propose 3DEditFormer, a 3D-structure-preserving conditional transformer. By enhancing image-to-3D generation with dual-guidance attention and time-adaptive gating, 3DEditFormer disentangles editable regions from preserved structure, enabling precise and consistent edits without requiring auxiliary 3D masks. Extensive experiments demonstrate that our framework outperforms state-of-the-art baselines both quantitatively and qualitatively, establishing a new standard for practical and scalable 3D editing. Dataset and code will be released. Project: https://www.lv-lab.org/3DEditFormer/

スケーラブルで一貫性のある3D編集に向けて

Towards Scalable and Consistent 3D Editing

要旨

Support