从上下文到技能:语言模型能否巧妙地从上下文中学习?
From Context to Skills: Can Language Models Learn from Context Skillfully?
May 3, 2026
作者: Shuzheng Si, Haozhe Zhao, Yu Lei, Qingyi Wang, Dingwei Chen, Zhitong Wang, Zhenhailong Wang, Kangyang Luo, Zheng Wang, Gang Chen, Fanchao Qi, Minjia Zhang, Maosong Sun
cs.AI
摘要
许多现实任务要求语言模型在超出其参数化知识范围的复杂语境中进行推理。这催生了语境学习的需求——语言模型需要直接从给定语境中学习相关知识。一种直观的解决方案是推理时技能增强:将语境中的规则和流程提取为自然语言技能。然而,为语境学习场景构建此类技能面临双重挑战:对技术密集的长文本进行人工技能标注成本过高,以及自动化技能构建缺乏外部反馈。本文提出Ctx2Skill框架,该自演进系统无需人工监督或外部反馈即可自主发现、优化和筛选语境专属技能。其核心采用多智能体自我博弈循环:挑战者生成探测任务与评分标准,推理者在动态技能集指导下尝试解题,中立的评判者提供二元反馈。关键创新在于挑战者与推理者通过积累的技能共同进化——专职的提议者和生成器智能体分析失败案例,将其转化为针对双方的技能更新,实现自动化技能发现与优化。为防止因极端任务生成和过度专业化技能积累导致的对弈崩溃,我们进一步引入跨时间回放机制,该机制能为推理者筛选出在代表性案例中达到最佳平衡的技能集,确保技能演进的鲁棒性和泛化能力。最终生成的技能可嵌入任意语言模型,显著提升语境学习能力。在CL-bench的四个语境学习任务上的评估表明,Ctx2Skill在不同骨干模型上持续提升了解题率。
English
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.