Trace2Skill: 軌道局所の知見を転移可能なエージェントスキルへ蒸留

要旨

大規模言語モデル（LLM）エージェントにドメイン特化型スキルを付与することは、複雑なタスクへの対応において極めて重要です。しかし、手動での構築は深刻なスケーラビリティのボトルネックを生み出します。一方、自動的なスキル生成は、表面的なパラメトリック知識に依存するか、一般化不可能な軌道局所的な教訓に逐次的に過学習するため、脆弱あるいは断片化した結果をもたらしがちです。この課題を克服するため、我々はTrace2Skillを提案します。このフレームワークは、人間の専門家がスキルを構築する方法——広範な実行経験を包括的に分析した後、単一の包括的なガイドへと蒸留する——を模倣しています。個々の軌道に逐次的に対応する代わりに、Trace2Skillは並列的なサブエージェント群を投入して多様な実行プールを分析します。軌道固有の教訓を抽出し、帰納的推論によって階層的に統合し、矛盾のない統一的なスキルディレクトリを構築します。Trace2Skillは、既存の人手記述スキルの深化と、新規スキルのゼロからの作成の両方をサポートします。スプレッドシート、VisionQA、数学推論といった挑戦的ドメインでの実験により、Trace2SkillがAnthropicの公式xlsxスキルを含む強力なベースラインを大幅に上回ることを示しました。重要なのは、この軌道に根ざした進化が、単にタスクインスタンスやモデル固有の癖を暗記するのではなく、進化したスキルがLLMの規模を超えて転移し、分布外（OOD）設定へ一般化することです。例えば、Qwen3.5-35Bが自身の軌道から進化させたスキルは、Qwen3.5-122BエージェントのWikiTableQuestionsにおける性能を最大57.65絶対パーセントポイント向上させました。最終的に、我々の結果は、複雑なエージェントの経験が、高い転移性を持つ宣言的スキルとしてパッケージ化可能であることを実証しています——パラメータ更新や外部検索モジュールを必要とせず、35Bパラメータという小規模なオープンソースモデルを利用して実現されています。

English

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

Trace2Skill: 軌道局所の知見を転移可能なエージェントスキルへ蒸留

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

要旨

Support