NGM: LLM向けプラグアンドプレイ訓練不要メモリモジュール

要旨

近年の研究では、知識の保存と神経計算を分離し、より直接的な知識アクセスを可能にする条件付きメモリモジュールが導入されています。動的な計算経路に依存するMoEと比較して、明示的なルックアップはより効率的な知識検索メカニズムを提供します。しかし、これらの手法は依然として学習されたメモリ埋め込みに依存しており、追加の訓練が必要で柔軟性が制限されます。この問題に対処するために、我々は訓練不要のプラグアンドプレイモジュールであるNグラムメモリ（NGM）を提案します。これは因果Nグラムエンコーダとコサインゲートメモリインジェクタから構成されます。因果Nグラムエンコーダは、バックボーンモデルの事前学習済みトークン埋め込みを直接平均することでNグラム表現を構築し、別個のNグラム埋め込みをゼロから訓練する必要性を排除します。この設計は追加のメモリテーブルも検索パイプラインも必要としません。そしてコサインゲートメモリインジェクタは、ノンパラメトリックなコサインゲートとReLUを用いて、検索された埋め込みを文脈表現に調整します。我々はNGMを0.6Bから14BまでのQwen3シリーズで8つのベンチマークにわたって評価しました。NGMは平均性能を0.5～1.2ポイント向上させ、特にコード生成や知識集約型タスク（例えばQwen3-14BではLiveCodeBenchで+3.0、GPQAで+3.03）で明確な改善が見られました。さらに、NGMはマルチモーダルベンチマーク（例えばQwen3-VL-2BではMMStarで+1.53）でも性能を向上させます。

English

Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these approaches still depend on learned memory embeddings, requiring additional training and limiting flexibility. To address this, we propose N-gram Memory (NGM), a training-free, plug-and-play module composed of a Causal N-Gram Encoder and a Cosine-Gated Memory Injector. The Causal N-Gram Encoder directly averages the pretrained token embeddings of the backbone model to construct N-gram representations, thereby eliminating the need to train separate N-gram embeddings from scratch. This design requires neither an additional memory table nor a retrieval pipeline. The Cosine-Gated Memory Injector then uses a non-parametric cosine gate with ReLU to modulate the retrieved embeddings into the contextual representations. We evaluate NGM on the Qwen3 series from 0.6B to 14B across eight benchmarks. NGM improves average performance by 0.5 to 1.2 points, with particularly clear gains on code generation and knowledge-intensive tasks (e.g., +3.0 on LiveCodeBench and +3.03 on GPQA for Qwen3-14B). Moreover, NGM also improves performance in multimodal benchmarks (e.g., MMStar +1.53 on Qwen3-VL-2B).