Uni-Edit:智能編輯是統一模型微調的通用任務
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
May 20, 2026
作者: Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li
cs.AI
摘要
目前,提升統一多模態模型(UMMs)在影像理解、生成與編輯方面的能力,主要依賴混合多任務訓練。由於任務間存在固有衝突,此類策略需要複雜的多階段流程、大量的數據混合以及平衡技巧,最終僅能達成性能上的取捨,而非真正的相互強化。為打破此一範式,我們提出 Uni-Edit,一項智慧型影像編輯任務,作為 UMM 調校的首個通用任務。與複雜的混合流程不同,Uni-Edit 僅需單一任務、單一訓練階段與單一數據集,即可同時提升所有三項能力。具體而言,我們首先認定影像編輯本質上即為理想的通用任務,因其自然同時需要視覺理解與生成能力。然而,現有的編輯數據依賴於過於簡化的指令,嚴重低估了模型的理解潛力。為解決此問題,我們引入了第一個自動化、可擴展的數據合成流程,專門用於智慧型編輯,將多樣的 VQA 數據轉換為嵌入問題與嵌套邏輯的複雜且有效的編輯指令。由此產生的 Uni-Edit-148k 數據集,將大量推理密集型指令與高品質的編輯影像配對。在 BAGEL 與 Janus-Pro 上的廣泛實驗顯示,僅以 Uni-Edit 進行調校,無需任何輔助操作,即可在所有三項能力上實現全面性提升。
English
Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tuning. Unlike complex mixed pipelines, Uni-Edit improves performance across all three abilities at once using only one task, one training stage, and one dataset. Specifically, we first identify image editing as an inherently ideal general task, as it naturally demands both visual understanding and generation. However, existing editing data relies on simplistic instructions that severely underutilize a model's understanding capacity. To address this, we introduce the first automated and scalable data synthesis pipeline for intelligent editing, transforming diverse VQA data into complex and effective editing instructions with embedded questions and nested logic. This yields Uni-Edit-148k, pairing diverse reasoning-intensive instructions with high-quality edited images. Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves comprehensive enhancements across all three capabilities without any auxiliary operations.