RewardHarness：自我進化代理後訓練

摘要

評估指令引導的圖像編輯需要能反映細微人類偏好的獎勵機制，然而現行獎勵模型通常依賴大規模偏好標註及額外模型訓練。這造成了數據效率落差：人類往往能僅從少數範例推斷出目標評估標準，而模型卻需藉由數十萬筆比較數據進行訓練。我們提出RewardHarness——一種自我演化代理人獎勵框架，將獎勵建模重新定義為情境演化而非權重優化。此框架不從大規模標註中學習，而是透過僅100個偏好示範案例，迭代演化工具與技能函式庫，進而與人類偏好對齊。給定原始圖像、候選編輯圖像及編輯指令後，由協調器從維護的函式庫中選取最相關的工具與技能子集，再由凍結的子代理人運用這些元素構建推理鏈條，產出偏好判斷。透過比對預測判斷與真實偏好，並分析推理過程中的成功與失敗案例，協調器能自動優化其工具與技能函式庫，無需額外人工標註。僅使用EditReward偏好數據中0.05%的資料，RewardHarness在圖像編輯評估基準上達成47.4%的平均準確率，超越GPT-5達5.3個百分點。當作為GRPO微調的獎勵訊號時，經強化學習調整的模型在ImgEdit-Bench上獲得3.52分。專案頁面：https://rewardharness.com。

English

Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference annotation and additional model training. This creates a data-efficiency gap: humans can often infer the target evaluation criteria from only a few examples, while models are usually trained on hundreds of thousands of comparisons. We present RewardHarness, a self-evolving agentic reward framework that reframes reward modeling as context evolution rather than weight optimization. Instead of learning from large-scale annotations, RewardHarness aligns with human preferences by iteratively evolving a library of tools and skills from as few as 100 preference demonstrations. Given a source image, candidate edited images, and an editing instruction, an Orchestrator selects the most relevant subset of tools and skills from the maintained library, and a frozen Sub-Agent uses them to construct a reasoning chain that produces a preference judgment. By comparing predicted judgments with ground-truth preferences and analyzing successes and failures in the reasoning process, the Orchestrator automatically refines its library of tools and skills without additional human annotation. Using only 0.05% of the EditReward preference data, RewardHarness achieves 47.4% average accuracy on image-editing evaluation benchmarks, surpassing GPT-5 by 5.3 points. When used as a reward signal for GRPO fine-tuning, RL-tuned models achieve 3.52 on ImgEdit-Bench. Project page: https://rewardharness.com.