PERK:長上下文推理作為參數高效的測試時學習
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
July 8, 2025
作者: Zeming Chen, Angelika Romanou, Gail Weiss, Antoine Bosselut
cs.AI
摘要
長上下文推理需要在廣泛且充滿噪聲的輸入環境中準確識別相關信息。先前研究表明,利用測試時學習將上下文直接編碼至模型參數中,能有效實現對噪聲信息的推理。然而,支持測試時學習的元學習方法因內存需求過高,難以應用於長上下文場景。本研究提出PERK(基於知識的參數高效推理),這是一種可擴展的方法,通過在測試時對輕量級模型適配器進行梯度更新來學習編碼長輸入上下文。具體而言,PERK在元訓練階段採用雙重嵌套優化循環:內層循環快速將上下文編碼至低秩適配器(LoRA),作為基礎模型的參數高效記憶模塊;外層循環則學習利用更新後的適配器,從編碼的長上下文中準確回憶並推理相關信息。我們在多項長上下文推理任務上的評估顯示,PERK顯著超越了基於提示的標準長上下文基線,對於較小模型(如GPT-2)平均絕對性能提升高達90%,對於我們評估的最大模型Qwen-2.5-0.5B也達到了27%的提升。總體而言,PERK在推理複雜性、長度外推及上下文中相關信息位置方面表現出更強的魯棒性。最後,我們指出,儘管PERK在訓練階段內存消耗較大,但在推理時相比基於提示的長上下文推理具有更高的效率擴展性。
English
Long-context reasoning requires accurately identifying relevant information
in extensive, noisy input contexts. Previous research shows that using
test-time learning to encode context directly into model parameters can
effectively enable reasoning over noisy information. However, meta-learning
methods for enabling test-time learning are prohibitively memory-intensive,
preventing their application to long context settings. In this work, we propose
PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for
learning to encode long input contexts using gradient updates to a lightweight
model adapter at test time. Specifically, PERK employs two nested optimization
loops in a meta-training phase. The inner loop rapidly encodes contexts into a
low-rank adapter (LoRA) that serves as a parameter-efficient memory module for
the base model. Concurrently, the outer loop learns to use the updated adapter
to accurately recall and reason over relevant information from the encoded long
context. Our evaluations on several long-context reasoning tasks show that PERK
significantly outperforms the standard prompt-based long-context baseline,
achieving average absolute performance gains of up to 90% for smaller models
(GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In
general, PERK is more robust to reasoning complexity, length extrapolation, and
the locations of relevant information in contexts. Finally, we show that while
PERK is memory-intensive during training, it scales more efficiently at
inference time than prompt-based long-context inference.