自己進化型LLMエージェントにおける継続的経験内面化の再考

要旨

経験の内在化は、過去の相互作用からの文脈的経験を再利用可能なパラメトリック能力へと変換し、大規模言語モデル（LLM）における継続的学習への有望な道筋を提供する。従来の研究は主に単一イテレーションの転送に焦点を当ててきたが、我々は複数イテレーションの経験学習において、既存手法が複合的な改善ではなく、進行的な能力崩壊を被ることを発見した。本稿では、経験内在化の三つの重要な側面を通じてこの失敗を体系的に検討する。（1）経験の粒度：原理レベルの経験は軌跡固有の詳細から転送可能な戦略を効果的に抽象化するため、事例レベルの経験よりも耐久性が高いことが判明した。（2）経験注入パターン：我々の分析は、段階的注入が中間的な決定状態と経験を整合させることにより、大域的注入を有意に上回る性能を示し、この特性は長期的ツール使用において重要であることを明らかにした。（3）内在化方式：オフポリシーの文脈蒸留を高品質な教師軌跡に適用することで、オンポリシーの文脈蒸留（これは学生が誘発した欠陥状態に対する局所的修正に本質的に制限される）よりも、はるかに安定した訓練信号が得られることを実証した。これらの洞察を統合することで、安定かつ持続可能な経験内在化のための単純ながら堅牢なレシピが得られ、自己進化型かつ継続的に学習するLLMを設計するための具体的な指針を提供する。

English

Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration experience learning, existing methods suffer from a progressive capability collapse rather than compounding improvement. We systematically examine this failure through three vital dimensions of experience internalization: (1) Experience Granularity: We find that principle-level experience is more durable than instance-level experience, as it effectively abstracts transferable strategies away from trajectory-specific details. (2) Experience Injection Pattern: Our analysis reveals that step-wise injection significantly outperforms global injection by aligning experience with intermediate decision states, a property that is critical for long-horizon tool use. (3) Internalization Regime: We demonstrate that off-policy context-distillation on high-quality teacher trajectories provides a substantially more stable training signal than on-policy context-distillation, which is inherently limited by local corrections on student-induced flawed states. Together, these insights yield a simple yet robust recipe for stable and sustainable experience internalization, providing concrete guidance for engineering self-evolving and continually learning LLMs.