SpatialEdit: 細粒度画像空間編集のベンチマーク

要旨

画像空間編集は、幾何学的に駆動される変換を実行し、オブジェクト配置とカメラ視点の精密な制御を可能にします。既存のモデルは細粒度の空間操作に不十分であるため、専用の評価スイートの開発が求められています。本研究の貢献は以下の通りです：（i）視点再構成とフレーミング分析を通じて、知覚的な自然さと幾何学的忠実度を統合的に測定する包括的ベンチマーク「SpatialEdit-Bench」を提案します。（ii）スケーラブルな学習のためのデータボトルネックに対処するため、制御可能なBlenderパイプラインで生成した合成データセット「SpatialEdit-500k」を構築しました。多様な背景と体系的なカメラ軌道にわたってオブジェクトをレンダリングし、オブジェクト中心・カメラ中心双方の操作に対する精密な正解変換を提供します。（iii）このデータに基づき、細粒度空間編集のベースラインモデル「SpatialEdit-16B」を開発しました。本手法は一般編集タスクで競争力のある性能を発揮すると同時に、空間操作タスクでは従来手法を大幅に上回ります。全てのリソースはhttps://github.com/EasonXiao-888/SpatialEdit で公開予定です。

English

Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.

SpatialEdit: 細粒度画像空間編集のベンチマーク

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

要旨

Support