TodoEvolve:智能体规划系统的架构学习
TodoEvolve: Learning to Architect Agent Planning Systems
February 8, 2026
作者: Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan
cs.AI
摘要
规划能力已成为当代智能体系统处理复杂长期任务的核心竞争力,但现有方法主要依赖固定的人工设计规划结构,缺乏适应开放性问题结构多样性的灵活性。为突破这一局限,我们提出TodoEvolve——一种能够自主合成并动态调整任务专属规划架构的元规划范式。具体而言,我们首先构建PlanFactory模块化设计空间,将拓扑构建、初始化、自适应调整与路径导航等多样化规划范式统一标准化至同一代码库,为异构规划模式提供通用接口。基于PlanFactory收集的高质量规划轨迹,我们通过阻抗导向偏好优化(IGPO)训练得到Todo-14B模型。该多目标强化学习框架能同步优化规划系统的性能稳定性、计算效率与令牌经济性,确保其在不同任务与智能体架构上的通用性。在五大智能体基准测试上的实证研究表明,TodoEvolve在保持较低API成本与运行时开销的同时,持续超越经过精心设计的规划模块。
English
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via Impedance-Guided Preference Optimization (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.