SpatialEdit:細粒度圖像空間編輯基準測試
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
April 6, 2026
作者: Yicheng Xiao, Wenhu Zhang, Lin Song, Yukang Chen, Wenbo Li, Nan Jiang, Tianhe Ren, Haokun Lin, Wei Huang, Haoyang Huang, Xiu Li, Nan Duan, Xiaojuan Qi
cs.AI
摘要
圖像空間編輯執行幾何驅動的變換,可精確控制物件佈局與相機視角。現有模型難以實現細粒度空間操作,因此需建立專用評估體系。我們的主要貢獻包括:(i)提出SpatialEdit-Bench綜合基準測試,通過視角重建與構圖分析雙重指標,聯合評估感知合理性和幾何保真度;(ii)為突破可擴展訓練的數據瓶頸,構建SpatialEdit-500k合成數據集,該數據集採用可控Blender管線生成,能在多樣化背景中渲染物件並系統化調整相機軌跡,為物件中心與相機中心操作提供精確的基準變換;(iii)基於此數據開發SpatialEdit-16B基線模型,該模型在通用編輯任務上表現優異,並在空間操控任務上顯著超越現有方法。所有資源將公開於https://github.com/EasonXiao-888/SpatialEdit。
English
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.