PERK: 장문맥 추론을 위한 파라미터 효율적 테스트 시간 학습

초록

장문맥 추론은 방대하고 잡음이 많은 입력 맥락에서 관련 정보를 정확하게 식별하는 것을 요구한다. 선행 연구에 따르면, 테스트 시간 학습을 통해 맥락을 모델 파라미터에 직접 인코딩하는 것이 잡음이 많은 정보에 대한 추론을 효과적으로 가능하게 할 수 있다. 그러나 테스트 시간 학습을 가능하게 하는 메타 학습 방법은 메모리 사용량이 과도하게 많아 장문맥 설정에 적용하기 어렵다. 본 연구에서는 테스트 시간에 경량 모델 어댑터에 대한 그래디언트 업데이트를 사용하여 장문 입력 맥락을 인코딩하는 확장 가능한 접근법인 PERK(Parameter Efficient Reasoning over Knowledge)를 제안한다. 구체적으로, PERK는 메타 학습 단계에서 두 개의 중첩된 최적화 루프를 사용한다. 내부 루프는 맥락을 저랭크 어댑터(LoRA)로 신속하게 인코딩하며, 이는 기본 모델을 위한 파라미터 효율적인 메모리 모듈로 기능한다. 동시에, 외부 루프는 업데이트된 어댑터를 사용하여 인코딩된 장문 맥락에서 관련 정보를 정확하게 회상하고 추론하는 방법을 학습한다. 여러 장문맥 추론 작업에 대한 평가 결과, PERK는 표준 프롬프트 기반 장문맥 베이스라인을 크게 능가하며, 작은 모델(GPT-2)의 경우 최대 90%, 평가된 가장 큰 모델인 Qwen-2.5-0.5B의 경우 최대 27%의 평균 절대 성능 향상을 달성했다. 일반적으로 PERK는 추론 복잡성, 길이 외삽, 그리고 맥락 내 관련 정보의 위치에 대해 더 강건하다. 마지막으로, PERK는 학습 중에는 메모리 사용량이 많지만, 추론 시간에는 프롬프트 기반 장문맥 추론보다 더 효율적으로 확장됨을 보여준다.

English

Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. Previous research shows that using test-time learning to encode context directly into model parameters can effectively enable reasoning over noisy information. However, meta-learning methods for enabling test-time learning are prohibitively memory-intensive, preventing their application to long context settings. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long input contexts using gradient updates to a lightweight model adapter at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard prompt-based long-context baseline, achieving average absolute performance gains of up to 90% for smaller models (GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In general, PERK is more robust to reasoning complexity, length extrapolation, and the locations of relevant information in contexts. Finally, we show that while PERK is memory-intensive during training, it scales more efficiently at inference time than prompt-based long-context inference.

PERK: 장문맥 추론을 위한 파라미터 효율적 테스트 시간 학습

PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

초록

Support