ChatPaper.aiChatPaper

從上下文到技能:語言模型能否巧妙地從上下文中學習?

From Context to Skills: Can Language Models Learn from Context Skillfully?

May 3, 2026
作者: Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun
cs.AI

摘要

許多現實世界任務要求語言模型能夠對超出其參數化知識範圍的複雜上下文進行推理,這就需要上下文學習能力——即模型直接從給定上下文中學習相關知識。一種直觀的解決方案是推理時技能增強:將上下文中的規則和流程提取為自然語言技能。然而,在上下文學習場景中構建此類技能面臨兩大挑戰:針對冗長且技術密集的上下文進行手動技能標註的成本過高,以及自動化技能構建缺乏外部反饋。本文提出Ctx2Skill框架,這是一種無需人工監督或外部反饋即可自主發現、優化並篩選上下文特定技能的自我演化系統。其核心機制採用多智能體自我博弈循環:由挑戰者生成探測任務與評分標準,推理者在動態技能集的指導下嘗試解決任務,中立評判者則提供二元反饋。關鍵在於,挑戰者與推理者雙方均通過積累的技能實現演化——專設的提案者和生成器智能體會分析失敗案例,並將其轉化為針對雙方的定向技能更新,從而實現自動化技能發現與精煉。為避免因任務生成極端化與技能過度專化導致的對抗性崩潰,我們進一步引入跨時回放機制,該機制能識別出在代表性案例中使推理者達到最佳平衡的技能組合,確保技能演化的魯棒性與泛化能力。最終生成的技能可無縫接入任何語言模型,以提升其上下文學習能力。在CL-bench的四個上下文學習任務上的實驗表明,Ctx2Skill在不同骨幹模型上均能持續提升問題解決率。
English
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.
PDF1202May 6, 2026