AgentSPEX：一种智能体规范与执行语言

摘要

语言模型智能体系统通常依赖反应式提示技术，即通过单一指令引导模型执行开放式推理与工具调用序列。这种方式将控制流与中间状态隐式化，可能导致智能体行为难以调控。虽然LangGraph、DSPy、CrewAI等编排框架通过显式工作流定义增强了结构性，但将工作流逻辑与Python代码紧密耦合，使得智能体的维护与修改变得困难。本文提出AgentSPEX（智能体规范与执行语言），该语言具备显式控制流与模块化结构，并配备可定制的智能体运行框架。AgentSPEX支持类型化步骤、分支循环、并行执行、可复用子模块以及显式状态管理，其工作流在提供工具调用、沙盒化虚拟环境、检查点、验证与日志功能的智能体框架中运行。此外，我们开发了具备同步图谱与工作流双视图的可视化编辑器用于流程编排与检查。我们提供了面向深度研究与科学研究的即用型智能体，并在7个基准测试上对AgentSPEX进行评估。最后通过用户研究表明，相较于现有主流智能体框架，AgentSPEX提供了更具可解释性与易用性的工作流编排范式。

English

Language-model agent systems commonly rely on reactive prompting, in which a single instruction guides the model through an open-ended sequence of reasoning and tool-use steps, leaving control flow and intermediate state implicit and making agent behavior potentially difficult to control. Orchestration frameworks such as LangGraph, DSPy, and CrewAI impose greater structure through explicit workflow definitions, but tightly couple workflow logic with Python, making agents difficult to maintain and modify. In this paper, we introduce AgentSPEX, an Agent SPecification and EXecution Language for specifying LLM-agent workflows with explicit control flow and modular structure, along with a customizable agent harness. AgentSPEX supports typed steps, branching and loops, parallel execution, reusable submodules, and explicit state management, and these workflows execute within an agent harness that provides tool access, a sandboxed virtual environment, and support for checkpointing, verification, and logging. Furthermore, we provide a visual editor with synchronized graph and workflow views for authoring and inspection. We include ready-to-use agents for deep research and scientific research, and we evaluate AgentSPEX on 7 benchmarks. Finally, we show through a user study that AgentSPEX provides a more interpretable and accessible workflow-authoring paradigm than a popular existing agent framework.