맥락에서 기술로: 언어 모델은 맥락을 능숙하게 학습할 수 있을까?

초록

실제 세계의 많은 과제는 언어 모델(LM)이 파라미터적 지식을 초과하는 복잡한 문맥을 추론해야 합니다. 이는 문맥 학습(Context Learning)을 요구하며, 여기서 LM은 주어진 문맥에서 직접 관련 지식을 학습합니다. 직관적인 해결책은 추론 시점 기술 증강(Inference-time Skill Augmentation)으로, 문맥에서 규칙과 절차를 추출하여 자연어 기술(Skill)로 만드는 것입니다. 그러나 문맥 학습 시나리오를 위해 이러한 기술을 구축하는 데는 두 가지 과제가 있습니다. 길고 기술적으로 밀도 높은 문맥에 대해 수동으로 기술을 주석 처리하는 데 드는 막대한 비용, 그리고 자동화된 기술 구축을 위한 외부 피드백의 부재입니다. 본 논문에서는 인간의 감독이나 외부 피드백 없이 문맥 특화 기술을 자율적으로 발견, 정제, 선택하는 자기 진화(Self-evolving) 프레임워크인 Ctx2Skill을 제안합니다. 그 핵심에는 다중 에이전트 자기 대결(Multi-agent Self-play) 루프가 있으며, 여기에는 탐색 과제(Probing Task)와 채점 기준(Rubric)을 생성하는 Challenger, 진화하는 기술 집합(Skill Set)의 지도를 받아 해당 과제를 해결하려 시도하는 Reasoner, 그리고 이진 피드백(Binary Feedback)을 제공하는 중립적인 Judge가 포함됩니다. 중요한 것은 Challenger와 Reasoner 모두 축적된 기술을 통해 진화한다는 점입니다. 전담 Proposer 및 Generator 에이전트는 실패 사례를 분석하고 이를 양측을 위한 표적 기술 업데이트(Targeted Skill Update)로 합성하여 자동화된 기술 발견과 정제를 가능하게 합니다. 점점 극단적으로 생성되는 과제와 지나치게 전문화된 기술 축적으로 인한 적대적 붕괴(Adversarial Collapse)를 방지하기 위해, 우리는 Cross-time Replay 메커니즘을 추가로 도입했습니다. 이 메커니즘은 Reasoner 측을 위해 대표 사례들에 걸쳐 최적의 균형을 이루는 기술 집합을 식별하여 견고하고 일반화 가능한 기술 진화를 보장합니다. 그 결과 생성된 기술은 어떤 언어 모델에도 연결하여 더 나은 문맥 학습 능력을 얻을 수 있습니다. CL-bench의 네 가지 문맥 학습 과제에서 평가된 결과, Ctx2Skill은 백본 모델(Backbone Model)들에 걸쳐 해결률(Solving Rate)을 지속적으로 향상시켰습니다.

English

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.

맥락에서 기술로: 언어 모델은 맥락을 능숙하게 학습할 수 있을까?

From Context to Skills: Can Language Models Learn from Context Skillfully?

초록

Support