从静态模板到动态运行时图：大语言模型智能体工作流优化综述

摘要

基于大型语言模型（LLM）的系统正日益流行，它们通过构建可执行的工作流来解决任务，这些工作流交织了LLM调用、信息检索、工具使用、代码执行、内存更新与验证。本文综述了近年来设计与优化此类工作流（我们称之为智能体计算图/ACGs）的方法。我们根据工作流结构确定的时间节点对文献进行梳理，其中“结构”指代组件或智能体的构成、相互依赖关系及信息流动方式。这一视角区分了静态方法（在部署前固定可复用工作流框架）与动态方法（在执行前或执行中为特定运行选择、生成或修订工作流）。我们进一步沿三个维度组织现有研究：结构确定的时间节点、工作流中被优化的部分、以及指导优化的评估信号（如任务指标、验证器信号、偏好或轨迹反馈）。同时，我们区分了可复用工作流模板、运行专用实现图与执行轨迹，将可复用的设计选择与具体运行中实际部署的结构及运行时行为相分离。最后，我们提出一种结构感知的评估视角，在下游任务指标基础上补充图级属性、执行成本、鲁棒性及跨输入的结构差异性。本文旨在为LLM智能体工作流优化研究提供清晰的术语体系、统一的方法定位框架、更具可比性的文献视图以及更可复现的评估标准。

English

Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification. This survey reviews recent methods for designing and optimizing such workflows, which we treat as agentic computation graphs (ACGs). We organize the literature based on when workflow structure is determined, where structure refers to which components or agents are present, how they depend on each other, and how information flows between them. This lens distinguishes static methods, which fix a reusable workflow scaffold before deployment, from dynamic methods, which select, generate, or revise the workflow for a particular run before or during execution. We further organize prior work along three dimensions: when structure is determined, what part of the workflow is optimized, and which evaluation signals guide optimization (e.g., task metrics, verifier signals, preferences, or trace-derived feedback). We also distinguish reusable workflow templates, run-specific realized graphs, and execution traces, separating reusable design choices from the structures actually deployed in a given run and from realized runtime behavior. Finally, we outline a structure-aware evaluation perspective that complements downstream task metrics with graph-level properties, execution cost, robustness, and structural variation across inputs. Our goal is to provide a clear vocabulary, a unified framework for positioning new methods, a more comparable view of existing body of literature, and a more reproducible evaluation standard for future work in workflow optimizations for LLM agents.