RePo: 컨텍스트 재배치를 통한 언어 모델

초록

인컨텍스트 러닝은 현대 대규모 언어 모델(LLM)의 핵심 요소이지만, 기존 아키텍처는 선형적이거나 고정된 위치 인덱스를 부여함으로써 경직되고 고정된 컨텍스트 구조를 강요합니다. 인지 부하 이론(CLT)에 기반하여, 우리는 이러한 정보성이 낮은 구조가 외인성 인지 부하를 증가시켜 깊은 추론과 주의 할당에 사용되어야 할 한정된 작업 기억 용량을 소모한다고 주장합니다. 이를 해결하기 위해 우리는 컨텍스트 재배치를 통해 외인성 부하를 줄이는 새로운 메커니즘인 RePo를 제안합니다. 기존 접근법과 달리 RePo는 미리 정의된 정수 범위에 의존하는 대신, 미분 가능 모듈 f_φ를 활용하여 컨텍스트적 의존성을 포착하는 토큰 위치를 할당합니다. OLMo-2 1B 백본에 대한 지속적 사전 학습을 통해, RePo가 일반적인 단기 컨텍스트 과제에서도 경쟁력 있는 성능을 유지하면서 노이즈가 있는 컨텍스트, 구조화된 데이터, 더 긴 컨텍스트 길이를 포함하는 과제에서 성능을 크게 향상시킴을 입증합니다. 상세 분석 결과, RePo는 멀리 떨어졌지만 관련성 높은 정보에 더 높은 주의를 할당하고, 조밀하고 비선형적인 공간에 위치를 배정하며, 입력 컨텍스트의 내재적 구조를 효과적으로 포착하는 것으로 나타났습니다. 우리의 코드는 https://github.com/SakanaAI/repo에서 확인할 수 있습니다.

English

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, f_φ, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context. Our code is available at https://github.com/SakanaAI/repo.

RePo: 컨텍스트 재배치를 통한 언어 모델

RePo: Language Models with Context Re-Positioning

초록

Support