Socratic-SWE：基于轨迹衍生的智能体技能实现自我进化的编码智能体

摘要

LLM驱动的软件工程代理已成为评估现实语言模型能力的核心测试平台，但其训练仍受限于高质量软件工程任务的可用性。现有的合成数据方法通常通过固定的代码变异或缺陷注入流程来创建任务，导致生成的数据分布与代理自身的缺陷及训练进程关联甚微。为此，我们提出Socratic-SWE——一种闭环自我进化框架，通过复用代理的历史求解轨迹作为训练信号源。该框架并非仅将轨迹视为奖励计算的依据，而是将其蒸馏为结构化代理技能，用以总结重复性失败模式与有效修复策略。这些技能进而指导从真实代码仓库中生成针对性修复任务。候选任务需通过基于执行的验证，并采用求解器梯度对齐奖励进行评分，从而确保保留的任务兼具可验证性与对求解器改进的有效性。更新后的求解器生成新轨迹，使任务课程可在多轮迭代中自适应调整。在SWE-bench Verified、SWE-bench Lite、SWE-bench Pro及Terminal-Bench 2.0基准测试中，Socratic-SWE在相同计算预算下持续优于自我进化基线方法，经过三次迭代后SWE-bench Verified得分达到50.40%。这一结果表明，求解轨迹可作为自我进化型软件工程代理的可扩展基础。

English

LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existing synthetic data methods typically create tasks through fixed mutation or bug-injection procedures, making the resulting distributions largely independent of the agent's own weaknesses and training progress. We introduce Socratic-SWE, a closed-loop self-evolution framework that reuses the agent's historical solving traces as a source of training signal. Rather than treating traces only as evidence for reward computation, Socratic-SWE distills them into structured agent skills that summarize recurring failures and effective repair patterns. These skills then guide the generation of targeted repair tasks in real repositories. Candidate tasks are checked through execution-based validation and scored with a solver-gradient alignment reward, so that the retained tasks are both verifiable and useful for improving the Solver. The updated Solver produces new traces, enabling the task curriculum to adapt over successive rounds. Across SWE-bench Verified, SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0, Socratic-SWE consistently improves over self-evolving baselines under the same compute budget, reaching 50.40% on SWE-bench Verified after three iterations. These results suggest that solving traces can serve as a scalable substrate for self-evolving SWE agents.