ChatPaper.aiChatPaper

RePo:基于上下文重定位的语言模型

RePo: Language Models with Context Re-Positioning

December 16, 2025
作者: Huayang Li, Tianyu Zhao, Richard Sproat
cs.AI

摘要

情境学习是现代大语言模型(LLM)的核心能力,然而主流架构通过分配线性或固定位置索引,强加了僵化的上下文结构。基于认知负荷理论(CLT),我们认为这种缺乏信息量的结构会增加外部认知负荷,消耗本应用于深度推理和注意力分配的有限工作记忆容量。为此,我们提出RePo——一种通过上下文重定位降低外部负荷的新机制。与标准方法不同,RePo采用可微分模块f_φ来分配能捕捉上下文依赖关系的词元位置,而非依赖预定义的整数范围。通过在OLMo-2 1B骨干网络上进行持续预训练,我们证明RePo在包含噪声上下文、结构化数据和长上下文任务中显著提升性能,同时在通用短上下文任务中保持竞争力。深入分析表明,RePo能成功为遥远但相关的信息分配更高注意力,在稠密非线性空间中定位位置,并捕捉输入上下文的内在结构。代码已开源:https://github.com/SakanaAI/repo。
English
In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, f_φ, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context. Our code is available at https://github.com/SakanaAI/repo.
PDF41December 18, 2025