スケールからスピードへ：画像編集のための適応的テスト時スケーリング

要旨

Image Chain-of-Thought（Image-CoT）は、推論時間を延長することで画像生成を改善するテスト時スケーリングのパラダイムである。既存のImage-CoT手法の多くはテキストから画像への生成（T2I）に焦点を当てている。T2I生成とは異なり、画像編集は目標指向的である：解空間は元画像と指示によって制約を受ける。この不一致により、Image-CoTを編集に適用する際には3つの課題が生じる。固定サンプリング予算による非効率なリソース配分、一般的なMLLMスコアを用いた初期段階検証の信頼性の低さ、大規模サンプリングによる冗長な編集結果である。これに対処するため、我々は編集の効率と性能を向上させるオンデマンド型テスト時スケーリングフレームワークであるADaptive Edit-CoT（ADE-CoT）を提案する。本手法は3つの主要な戦略を組み込む：（1）編集難易度の推定に基づき動的予算を割り当てる難易度対応リソース配分、（2）領域位置特定とキャプション一貫性を用いて有望な候補を選択する早期刈り込みにおける編集特化型検証、（3）インスタンス特化型検証器に導かれる深さ優先の機会的停止（意図に合致する結果が見つかり次第終了）。3つのSOTA編集モデル（Step1X-Edit、BAGEL、FLUX.1 Kontext）と3つのベンチマークを用いた大規模実験により、ADE-CoTが優れた性能と効率のトレードオフを達成することが示された。同等のサンプリング予算条件下で、ADE-CoTはBest-of-Nと比較して2倍以上の高速化を実現しつつ、より優れた性能を獲得する。

English

Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image and instruction. This mismatch causes three challenges when applying Image-CoT to editing: inefficient resource allocation with fixed sampling budgets, unreliable early-stage verification using general MLLM scores, and redundant edited results from large-scale sampling. To address this, we propose ADaptive Edit-CoT (ADE-CoT), an on-demand test-time scaling framework to enhance editing efficiency and performance. It incorporates three key strategies: (1) a difficulty-aware resource allocation that assigns dynamic budgets based on estimated edit difficulty; (2) edit-specific verification in early pruning that uses region localization and caption consistency to select promising candidates; and (3) depth-first opportunistic stopping, guided by an instance-specific verifier, that terminates when intent-aligned results are found. Extensive experiments on three SOTA editing models (Step1X-Edit, BAGEL, FLUX.1 Kontext) across three benchmarks show that ADE-CoT achieves superior performance-efficiency trade-offs. With comparable sampling budgets, ADE-CoT obtains better performance with more than 2x speedup over Best-of-N.

スケールからスピードへ：画像編集のための適応的テスト時スケーリング

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

要旨

Support