ChatPaper.aiChatPaper

TodoEvolve:智能体规划系统的架构学习

TodoEvolve: Learning to Architect Agent Planning Systems

February 8, 2026
作者: Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan
cs.AI

摘要

规划能力已成为当代智能体系统处理复杂长期任务的核心竞争力,但现有方法主要依赖固定的人工设计规划结构,缺乏适应开放性问题结构多样性的灵活性。为突破这一局限,我们提出TodoEvolve——一种能够自主合成并动态调整任务专属规划架构的元规划范式。具体而言,我们首先构建PlanFactory模块化设计空间,将拓扑构建、初始化、自适应调整与路径导航等多样化规划范式统一标准化至同一代码库,为异构规划模式提供通用接口。基于PlanFactory收集的高质量规划轨迹,我们通过阻抗导向偏好优化(IGPO)训练得到Todo-14B模型。该多目标强化学习框架能同步优化规划系统的性能稳定性、计算效率与令牌经济性,确保其在不同任务与智能体架构上的通用性。在五大智能体基准测试上的实证研究表明,TodoEvolve在保持较低API成本与运行时开销的同时,持续超越经过精心设计的规划模块。
English
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via Impedance-Guided Preference Optimization (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
PDF41February 12, 2026