ChatPaper.aiChatPaper

EditThinker:為任意圖像編輯器解鎖迭代推理能力

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

December 5, 2025
作者: Hongyu Li, Manyuan Zhang, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si Liu
cs.AI

摘要

基於指令的圖像編輯已成為一個重要的研究領域,該領域受益於圖像生成基礎模型,已實現高水準的美學品質,使得指令跟隨能力成為當前主要挑戰。現有方法通過監督學習或強化學習提升指令遵循度,但由於內在隨機性與缺乏審議過程,單輪編輯成功率仍有限制。本研究提出一種審議式編輯框架,使模型在編輯過程中進行「思考」,通過迭代執行「邊編輯邊思考」循環來模擬人類認知迴路:對結果進行批判評分並精煉指令,隨後重複生成直至滿意。具體而言,我們訓練單一多模態大語言模型EditThinker作為框架的推理引擎,聯合生成批判分數、推理過程和優化指令。我們採用強化學習將EditThinker的思考過程與編輯行為對齊,從而產生更具針對性的指令改進。在四個基準測試上的大量實驗表明,我們的方法能顯著提升各類圖像編輯模型的指令跟隨能力。我們將公開數據構建框架、數據集和模型,以促進相關領域發展。
English
Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aesthetic quality, making instruction-following capability the primary challenge. Existing approaches improve instruction adherence via supervised or reinforcement learning, yet single-turn success rates remain limited due to inherent stochasticity and a lack of deliberation. In this work, we propose a deliberative editing framework to 'think' while they edit, which simulates the human cognitive loop by iteratively executing a Think-while-Edit cycle: Critiquing results and Refining instructions , followed by Repeating the generation until satisfactory. Specifically, we train a single MLLM, EditThinker, to act as the reasoning engine of this framework, which jointly produce the critique score, reasoning process, and refined instructions. We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements. Extensive experiments on four benchmarks demonstrate that our approach significantly improves the instruction-following capability of any image editing model by a large margin. We will release our data construction framework, datasets, and models to benefit the community.
PDF333December 9, 2025