オーバーフロー防止が長文脈再帰型LLMを強化する

要旨

最近のLLM（大規模言語モデル）のトレンドとして、長文脈処理の効率性を向上させる再帰的サブクアドラティックモデルの開発が進められている。本研究では、主要な大規模長文脈モデルを調査し、それらの固定サイズの再帰的メモリが性能にどのような影響を与えるかに焦点を当てた。実験の結果、これらのモデルが長文脈に対して訓練されていたとしても、長文脈の活用が十分に行われていないことが明らかとなった。具体的には、入力の最も関連性の高い部分のみを特定して処理するチャンクベースの推論手順が、再帰的メモリの失敗を軽減し、多くの長文脈タスクにおいて有効であることを示した。LongBenchにおいて、本手法はFalcon3-Mamba-Inst-7Bの全体性能を14%、Falcon-Mamba-Inst-7Bを28%、RecurrentGemma-IT-9Bを50%、RWKV6-Finch-7Bを51%向上させた。驚くべきことに、このシンプルなアプローチは、挑戦的なLongBench v2ベンチマークにおいても最先端の結果をもたらし、同等サイズのTransformerモデルと競合する性能を示した。さらに、本研究の結果は、再帰的モデルが真に長距離依存性を活用しているかどうかについて疑問を投げかけるものである。なぜなら、単一チャンク戦略が、文脈間の関係を必要とするとされるタスクにおいても、より強い性能を発揮したからである。

English

A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2 benchmark, showing competitive performance with equivalent size Transformers. Furthermore, our findings raise questions about whether recurrent models genuinely exploit long-range dependencies, as our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-context relations.

オーバーフロー防止が長文脈再帰型LLMを強化する

Overflow Prevention Enhances Long-Context Recurrent LLMs

要旨

Support