从规模到速度:图像编辑的自适应测试时缩放
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
February 24, 2026
作者: Xiangyan Qu, Zhenlong Yuan, Jing Tang, Rui Chen, Datao Tang, Meng Yu, Lei Sun, Yancheng Bai, Xiangxiang Chu, Gaopeng Gou, Gang Xiong, Yujun Cai
cs.AI
摘要
图像思维链(Image-CoT)是一种通过延长推理时间提升图像生成质量的测试时扩展范式。现有方法主要聚焦于文本到图像(T2I)生成,而图像编辑具有目标导向性:其解空间受源图像和编辑指令的双重约束。这种差异导致Image-CoT应用于编辑任务时面临三大挑战:固定采样预算下的资源分配低效、通用多模态大模型评分在早期验证中的不可靠性,以及大规模采样导致的冗余编辑结果。为此,我们提出自适应编辑思维链(ADE-CoT),一种按需分配的测试时扩展框架,以提升编辑效率与性能。该框架包含三大核心策略:(1)基于编辑难度估计的动态资源分配机制;(2)融合区域定位与描述一致性的编辑专用早期筛选验证;(3)由实例化验证器引导的深度优先机会性终止策略,在获得意图对齐结果时即时停止推理。在三个前沿编辑模型(Step1X-Edit、BAGEL、FLUX.1 Kontext)和三大基准测试上的实验表明,ADE-CoT实现了更优的性能-效率平衡。在同等采样预算下,其性能优于最佳采样法(Best-of-N)且推理速度提升超2倍。
English
Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image and instruction. This mismatch causes three challenges when applying Image-CoT to editing: inefficient resource allocation with fixed sampling budgets, unreliable early-stage verification using general MLLM scores, and redundant edited results from large-scale sampling. To address this, we propose ADaptive Edit-CoT (ADE-CoT), an on-demand test-time scaling framework to enhance editing efficiency and performance. It incorporates three key strategies: (1) a difficulty-aware resource allocation that assigns dynamic budgets based on estimated edit difficulty; (2) edit-specific verification in early pruning that uses region localization and caption consistency to select promising candidates; and (3) depth-first opportunistic stopping, guided by an instance-specific verifier, that terminates when intent-aligned results are found. Extensive experiments on three SOTA editing models (Step1X-Edit, BAGEL, FLUX.1 Kontext) across three benchmarks show that ADE-CoT achieves superior performance-efficiency trade-offs. With comparable sampling budgets, ADE-CoT obtains better performance with more than 2x speedup over Best-of-N.