PERK:长上下文推理作为参数高效的测试时学习
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
July 8, 2025
作者: Zeming Chen, Angelika Romanou, Gail Weiss, Antoine Bosselut
cs.AI
摘要
长上下文推理需要在广泛且嘈杂的输入环境中准确识别相关信息。先前研究表明,通过测试时学习将上下文直接编码到模型参数中,能有效实现对噪声信息的推理。然而,支持测试时学习的元学习方法对内存需求极高,阻碍了其在长上下文场景中的应用。本研究中,我们提出了PERK(参数高效知识推理),一种可扩展的方法,通过在测试时对轻量级模型适配器进行梯度更新来学习编码长输入上下文。具体而言,PERK在元训练阶段采用双重优化循环:内循环快速将上下文编码至低秩适配器(LoRA),作为基础模型的参数高效记忆模块;同时,外循环学习如何利用更新后的适配器,从编码的长上下文中准确回忆并推理相关信息。我们在多项长上下文推理任务上的评估显示,PERK显著超越了基于提示的标准长上下文基线,对于较小模型(如GPT-2)实现了高达90%的平均绝对性能提升,对于评估的最大模型Qwen-2.5-0.5B也达到了27%的提升。总体而言,PERK在推理复杂性、长度外推及上下文中相关信息位置的适应性上表现更为稳健。最后,我们证明,尽管PERK在训练期间内存消耗较大,但在推理时的扩展效率优于基于提示的长上下文推理方法。
English
Long-context reasoning requires accurately identifying relevant information
in extensive, noisy input contexts. Previous research shows that using
test-time learning to encode context directly into model parameters can
effectively enable reasoning over noisy information. However, meta-learning
methods for enabling test-time learning are prohibitively memory-intensive,
preventing their application to long context settings. In this work, we propose
PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for
learning to encode long input contexts using gradient updates to a lightweight
model adapter at test time. Specifically, PERK employs two nested optimization
loops in a meta-training phase. The inner loop rapidly encodes contexts into a
low-rank adapter (LoRA) that serves as a parameter-efficient memory module for
the base model. Concurrently, the outer loop learns to use the updated adapter
to accurately recall and reason over relevant information from the encoded long
context. Our evaluations on several long-context reasoning tasks show that PERK
significantly outperforms the standard prompt-based long-context baseline,
achieving average absolute performance gains of up to 90% for smaller models
(GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In
general, PERK is more robust to reasoning complexity, length extrapolation, and
the locations of relevant information in contexts. Finally, we show that while
PERK is memory-intensive during training, it scales more efficiently at
inference time than prompt-based long-context inference.