ChatPaper.aiChatPaper

UnicEdit-10M:通过推理增强型编辑的统一验证打破规模-质量壁垒的数据集与基准

UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

December 1, 2025
作者: Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing Lyu, Zhou Zhao, Shengyu Zhang
cs.AI

摘要

随着GPT-4o、Nano Banana及Seedream 4.0等强大多模态模型在图像编辑领域的快速发展,闭源与开源模型之间的性能差距正在扩大,这主要源于大规模高质量训练数据的稀缺性,以及缺乏能够诊断多样化编辑行为中模型弱点的综合性基准测试。现有数据构建方法面临规模与质量的权衡:人工标注质量高但难以规模化,而自动化流程则存在错误传播和噪声问题。为此,我们提出一种轻量级数据流水线,通过端到端模型和统一的后验阶段取代多工具链流程。为实现可扩展的质量控制,我们训练了一个70亿参数的双任务专家模型Qwen-Verify,用于高效执行错误检测和指令重描述。该流水线最终产出UnicEdit-10M——一个涵盖多样化基础与复杂编辑任务的千万级数据集。我们还提出通用基准测试UnicBench,其突破基础编辑范畴,显式评估空间与知识驱动的推理能力。为实现细粒度诊断,我们引入了非编辑区域一致性、推理准确度等新颖指标。基于UnicBench对主流模型的分析揭示了其局限性,为未来研究指明了清晰方向。
English
With the rapid advances of powerful multimodal models such as GPT-4o, Nano Banana, and Seedream 4.0 in Image Editing, the performance gap between closed-source and open-source models is widening, primarily due to the scarcity of large-scale, high-quality training data and comprehensive benchmarks capable of diagnosing model weaknesses across diverse editing behaviors. Existing data construction methods face a scale-quality trade-off: human annotations are high-quality but not scalable, while automated pipelines suffer from error propagation and noise. To address this, we introduce a lightweight data pipeline that replaces multi-toolchains with an end-to-end model and a unified post-verification stage. For scalable quality control, we train a 7B dual-task expert model, Qwen-Verify, for efficient failure detection and instruction recaptioning. This pipeline yields UnicEdit-10M, a 10M-scale dataset spanning diverse basic and complex editing tasks. We also propose UnicBench, a general benchmark that extends beyond basic edits to explicitly assess spatial and knowledge-driven reasoning. To enable fine-grained diagnosis, we introduce novel metrics, including Non-edit Consistency and Reasoning Accuracy. Our analysis of mainstream models on UnicBench reveals their limitations and provides clear directions for future research.
PDF11December 4, 2025