ChatPaper.aiChatPaper

UnicEdit-10M:通过统一验证实现推理增强型编辑的规模与质量突破的数据集及基准测试

UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

December 1, 2025
作者: Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing Lyu, Zhou Zhao, Shengyu Zhang
cs.AI

摘要

随着GPT-4o、Nano Banana、Seedream 4.0等强大多模态模型在图像编辑领域的快速发展,闭源与开源模型之间的性能差距正在扩大,这主要源于大规模高质量训练数据的稀缺性,以及缺乏能够全面诊断模型在多样化编辑行为中弱点的基准测试体系。现有数据构建方法面临规模与质量的权衡:人工标注质量高但难以规模化,而自动化流程则存在错误传播和噪声问题。为此,我们提出一种轻量级数据流水线,通过端到端模型和统一的后验证阶段替代多工具链流程。为实现规模化质量控制,我们训练了70亿参数的双任务专家模型Qwen-Verify,用于高效执行错误检测和指令重描述。该流水线最终产出UnicEdit-10M——一个涵盖多样化基础与复杂编辑任务的千万级数据集。我们还提出通用基准测试UnicBench,其突破基础编辑范畴,显式评估空间推理与知识驱动推理能力。为实现细粒度诊断,我们引入了非编辑区域一致性和推理准确率等新型评估指标。基于UnicBench对主流模型的深入分析,不仅揭示了现有模型的局限性,更为未来研究指明了清晰方向。
English
With the rapid advances of powerful multimodal models such as GPT-4o, Nano Banana, and Seedream 4.0 in Image Editing, the performance gap between closed-source and open-source models is widening, primarily due to the scarcity of large-scale, high-quality training data and comprehensive benchmarks capable of diagnosing model weaknesses across diverse editing behaviors. Existing data construction methods face a scale-quality trade-off: human annotations are high-quality but not scalable, while automated pipelines suffer from error propagation and noise. To address this, we introduce a lightweight data pipeline that replaces multi-toolchains with an end-to-end model and a unified post-verification stage. For scalable quality control, we train a 7B dual-task expert model, Qwen-Verify, for efficient failure detection and instruction recaptioning. This pipeline yields UnicEdit-10M, a 10M-scale dataset spanning diverse basic and complex editing tasks. We also propose UnicBench, a general benchmark that extends beyond basic edits to explicitly assess spatial and knowledge-driven reasoning. To enable fine-grained diagnosis, we introduce novel metrics, including Non-edit Consistency and Reasoning Accuracy. Our analysis of mainstream models on UnicBench reveals their limitations and provides clear directions for future research.
PDF11December 4, 2025