ChatPaper.aiChatPaper

TodoEvolve:智能体规划系统的架构学习

TodoEvolve: Learning to Architect Agent Planning Systems

February 8, 2026
作者: Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan
cs.AI

摘要

規劃能力已成為當代智能體系統處理複雜長期任務的核心能力,然而現有方法主要依賴固定的人工設計規劃結構,缺乏適應開放式問題結構多樣性的靈活性。為解決這一局限性,我們提出TodoEvolve——一種能夠自主合成並動態修正任務特定規劃架構的元規劃範式。具體而言,我們首先構建PlanFactory模塊化設計空間,將多種規劃範式統一規範於包含拓撲結構、初始化、適應性與導航機制的代碼庫中,從而為異構規劃模式提供通用接口。基於PlanFactory,我們採集高質量規劃軌跡數據,並通過阻抗導向偏好優化(IGPO)訓練Todo-14B模型。該多目標強化學習框架能激勵生成兼具高性能、穩定性與令牌效率的規劃系統,適用於任意任務與智能體架構。在五項智能體基準測試上的實證研究表明,TodoEvolve在保持經濟的API成本與運行開銷的同時,持續超越精心設計的規劃模塊。
English
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via Impedance-Guided Preference Optimization (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
PDF41February 12, 2026