SEVerA：自演进智能体的可验证合成

摘要

近期研究表明，自演进大语言模型智能体在程序修复和科学发现等任务中展现出卓越效能。该范式下，规划器大模型通过综合调用包含大语言模型在内的参数化模型来构建智能体程序，这些参数化模型会针对具体任务进行调优以提升性能。然而现有自演进智能体框架缺乏对安全性与正确性的形式化保证。由于此类程序常需在未见输入上自主执行，这种保证缺失引发了可靠性与安全性的担忧。我们将智能体代码生成构建为约束学习问题，将硬性形式规约与捕捉任务效用的软性目标相结合。提出形式化守护生成模型（FGGM），使规划器大语言模型能够使用一阶逻辑为每个生成模型调用指定形式化输出契约。每个FGGM调用将底层模型封装在带有可验证回退机制的拒绝采样器中，确保所有返回输出在任何输入和参数设置下均满足契约要求。基于FGGM，我们提出三阶段框架SEVerA（自演进可验证智能体）：搜索阶段合成包含FGGM调用的候选参数化程序；验证阶段针对所有参数值证明程序满足硬约束的正确性，将问题简化为无约束学习；学习阶段采用基于梯度的可扩展优化（包括GRPO式微调）来提升软性目标同时保持正确性。我们在Dafny程序验证、符号数学合成及策略合规的智能体工具使用（τ^2-bench）任务上评估SEVerA。在所有任务中，SEVerA在实现零约束违反的同时，性能优于无约束和SOTA基线方法，表明形式化行为约束不仅能保证正确性，还能引导合成过程生成更高质量的智能体。

English

Recent advances have shown the effectiveness of self-evolving LLM agents on tasks such as program repair and scientific discovery. In this paradigm, a planner LLM synthesizes an agent program that invokes parametric models, including LLMs, which are then tuned per task to improve performance. However, existing self-evolving agent frameworks provide no formal guarantees of safety or correctness. Because such programs are often executed autonomously on unseen inputs, this lack of guarantees raises reliability and security concerns. We formulate agentic code generation as a constrained learning problem, combining hard formal specifications with soft objectives capturing task utility. We introduce Formally Guarded Generative Models (FGGM), which allow the planner LLM to specify a formal output contract for each generative model call using first-order logic. Each FGGM call wraps the underlying model in a rejection sampler with a verified fallback, ensuring every returned output satisfies the contract for any input and parameter setting. Building on FGGM, we present SEVerA (Self-Evolving Verified Agents), a three-stage framework: Search synthesizes candidate parametric programs containing FGGM calls; Verification proves correctness with respect to hard constraints for all parameter values, reducing the problem to unconstrained learning; and Learning applies scalable gradient-based optimization, including GRPO-style fine-tuning, to improve the soft objective while preserving correctness. We evaluate SEVerA on Dafny program verification, symbolic math synthesis, and policy-compliant agentic tool use (τ^2-bench). Across tasks, SEVerA achieves zero constraint violations while improving performance over unconstrained and SOTA baselines, showing that formal behavioral constraints not only guarantee correctness but also steer synthesis toward higher-quality agents.