再思考自我演化LLM智能体的持续经验内化

摘要

经验内化将过往交互中的上下文经验转化为可复用的参数化能力，为大语言模型的持续学习提供了一条有前景的路径。虽然先前研究主要集中于单次迭代迁移，但我们发现，在多轮经验学习场景下，现有方法会遭遇渐进能力退化而非复合式提升。我们通过经验内化的三个关键维度系统审视了这一失效现象：（1）经验粒度：研究发现，原则级经验比实例级经验更具持久性，因为它能有效从轨迹特定细节中提炼可迁移策略；（2）经验注入模式：分析表明，逐步注入通过将经验与中间决策状态对齐，显著优于全局注入，这一特性对于长时程工具使用至关重要；（3）内化范式：我们证明，基于高质量教师轨迹的离策略上下文蒸馏比在策略上下文蒸馏能提供更稳定的训练信号，后者本质上受限于对学生诱发错误状态的局部修正。综合这些洞见，我们提出了一个简洁而稳健的可持续经验内化方案，为构建自我演化且持续学习的大语言模型提供了具体指导。

English

Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration experience learning, existing methods suffer from a progressive capability collapse rather than compounding improvement. We systematically examine this failure through three vital dimensions of experience internalization: (1) Experience Granularity: We find that principle-level experience is more durable than instance-level experience, as it effectively abstracts transferable strategies away from trajectory-specific details. (2) Experience Injection Pattern: Our analysis reveals that step-wise injection significantly outperforms global injection by aligning experience with intermediate decision states, a property that is critical for long-horizon tool use. (3) Internalization Regime: We demonstrate that off-policy context-distillation on high-quality teacher trajectories provides a substantially more stable training signal than on-policy context-distillation, which is inherently limited by local corrections on student-induced flawed states. Together, these insights yield a simple yet robust recipe for stable and sustainable experience internalization, providing concrete guidance for engineering self-evolving and continually learning LLMs.