回溯式操控優化：透過對軌跡展開的自我偏好改進LLM智能體

摘要

AI代理依赖由技能、工具和工作流组成的框架来解决复杂问题。持续改进这一框架对于适应新任务至关重要。然而，现有的优化方法通常需要真实标注验证集，但在实际部署场景中，此类标注数据难以获取。为解决这一问题，我们提出回溯式框架优化（RHO），这是一种仅利用过去轨迹来优化代理框架的自监督方法。具体而言，RHO从过去轨迹中选取具有挑战性任务的多样化核心集，并并行重新求解。代理通过自我验证和自洽性分析这些滚动结果，生成候选框架更新，并通过其自身的成对自我偏好选择最有效的更新。我们在软件工程、技术工作和知识工作三个不同领域对RHO进行了评估。值得注意的是，单次优化轮次将SWE-Bench Pro的通过率从59%提升至78%，且无需任何外部评分。此外，我们的分析表明，RHO能有效针对先前的失败模式。因此，优化后的框架改变了代理的行为模式，并在长周期会话中保持更高的准确性。

English

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.