文脈からスキルへ：言語モデルは文脈から巧みに学習できるか？

要旨

多くの現実世界タスクでは、言語モデル（LM）が自身のパラメトリック知識を超える複雑な文脈を推論する必要があります。これには文脈学習が不可欠であり、LMは与えられた文脈から直接関連知識を学習します。直感的な解決策は推論時技能拡張です：文脈から規則や手続きを自然言語のスキルとして抽出します。しかし、文脈学習シナリオ向けのスキル構築には二つの課題があります。技術的に密度の高い長文コンテキストに対する手動スキル注釈の非現実的なコスト、そして自動化されたスキル構築のための外部フィードバックの欠如です。本論文では、人間の監督や外部フィードバックなしで文脈固有のスキルを自律的に発見・洗練・選択する自己進化型フレームワーク「Ctx2Skill」を提案します。その中核には、マルチエージェント自己対話ループがあり、探査タスクと評価基準を生成するChallenger、進化するスキルセットの導きにより問題解決を試みるReasoner、二値フィードバックを提供する中立のJudgeで構成されます。重要なのは、ChallengerとReasonerの両方が蓄積されたスキルを通じて進化することです。専任のProposerとGeneratorエージェントが失敗事例を分析し、両サイド向けの標的型スキル更新として統合することで、自動化されたスキル発見と洗練を実現します。過度に極端なタスク生成と過剰特化したスキル蓄積による敵対的崩壊を防ぐため、クロスタイムリプレイ機構を導入し、Reasoner側において代表的ケース全体で最適なバランスを達成するスキルセットを特定することで、堅牢で一般化可能なスキル進化を保証します。生成されたスキルは任意の言語モデルに組み込むことができ、優れた文脈学習能力を付与します。CL-benchの4つの文脈学習タスクで評価した結果、Ctx2Skillはバックボーンモデル間で解決率を一貫して向上させました。

English

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.

文脈からスキルへ：言語モデルは文脈から巧みに学習できるか？

From Context to Skills: Can Language Models Learn from Context Skillfully?

要旨

Support