TodoEvolve:智能体规划系统的架构学习
TodoEvolve: Learning to Architect Agent Planning Systems
February 8, 2026
作者: Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan
cs.AI
摘要
規劃能力已成為當代智能體系統處理複雜長期任務的核心能力,然而現有方法主要依賴固定的人工設計規劃結構,缺乏適應開放式問題結構多樣性的靈活性。為解決這一局限性,我們提出TodoEvolve——一種能夠自主合成並動態修正任務特定規劃架構的元規劃範式。具體而言,我們首先構建PlanFactory模塊化設計空間,將多種規劃範式統一規範於包含拓撲結構、初始化、適應性與導航機制的代碼庫中,從而為異構規劃模式提供通用接口。基於PlanFactory,我們採集高質量規劃軌跡數據,並通過阻抗導向偏好優化(IGPO)訓練Todo-14B模型。該多目標強化學習框架能激勵生成兼具高性能、穩定性與令牌效率的規劃系統,適用於任意任務與智能體架構。在五項智能體基準測試上的實證研究表明,TodoEvolve在保持經濟的API成本與運行開銷的同時,持續超越精心設計的規劃模塊。
English
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via Impedance-Guided Preference Optimization (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.