오버플로우 방지는 장문맥 순환 LLM의 성능을 향상시킨다

초록

최근 대형 언어 모델(LLMs)의 한 가지 트렌드는 장문맥 처리 효율성을 개선하기 위해 반복적 서브-2차 모델을 개발하는 것이다. 우리는 주요 대형 장문맥 모델을 조사하며, 이들의 고정 크기 반복 메모리가 성능에 미치는 영향에 초점을 맞춘다. 실험 결과, 이러한 모델들이 확장된 문맥에 대해 훈련되었음에도 불구하고 장문맥 활용도는 여전히 미흡한 것으로 나타났다. 구체적으로, 입력의 가장 관련성 높은 부분만을 식별하고 처리하는 청크 기반 추론 절차가 반복 메모리 실패를 완화하고 많은 장문맥 작업에 효과적임을 입증한다: LongBench에서 우리의 방법은 Falcon3-Mamba-Inst-7B의 전체 성능을 14%, Falcon-Mamba-Inst-7B를 28%, RecurrentGemma-IT-9B를 50%, RWKV6-Finch-7B를 51% 향상시켰다. 놀랍게도, 이 간단한 접근법은 도전적인 LongBench v2 벤치마크에서도 최신 기술 수준의 결과를 보여주며, 동일한 크기의 Transformer 모델과 경쟁력 있는 성능을 보였다. 더 나아가, 우리의 연구 결과는 반복 모델이 실제로 장거리 의존성을 활용하는지에 대한 의문을 제기한다. 단일 청크 전략이 교차 문맥 관계가 필요한 작업에서도 더 강력한 성능을 보여주기 때문이다.

English

A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2 benchmark, showing competitive performance with equivalent size Transformers. Furthermore, our findings raise questions about whether recurrent models genuinely exploit long-range dependencies, as our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-context relations.

오버플로우 방지는 장문맥 순환 LLM의 성능을 향상시킨다

Overflow Prevention Enhances Long-Context Recurrent LLMs

초록

Support