LM2: 大規模メモリモデル

要旨

本論文では、Large Memory Model（LM2）と呼ばれる、補助メモリモジュールを備えたデコーダー専用のTransformerアーキテクチャが紹介されており、これは標準のTransformerの制限に対処し、多段階の推論、関係論証、および長い文脈に分散された情報の統合に取り組んでいます。提案されたLM2は、入力トークンと相互作用し、ゲートメカニズムを介して更新されるコンテキスト表現リポジトリとして機能するメモリモジュールを組み込んでいます。Transformerの汎用性を維持するために、LM2は元の情報フローを維持しながら補完的なメモリ経路を統合しています。BABILongベンチマークでの実験結果によると、LM2モデルは、タスク全体で記憶拡張型RMTモデルを37.1%、ベースラインのLlama-3.2モデルを86.3% 平均的に上回ることを示しています。LM2は、マルチホップ推論、数値推論、および大規模文脈の質問応答において優れた能力を発揮します。MMLUデータセットでは、事前学習されたバニラモデルよりも5.0%の改善を達成し、そのメモリモジュールが一般的なタスクのパフォーマンスを低下させないことを示しています。さらに、私たちの分析では、メモリの解釈可能性、メモリモジュールの効果、およびテスト時の挙動について検討しています。私たちの調査結果は、Transformerアーキテクチャを向上させるための明示的なメモリの重要性を強調しています。

English

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.

LM2: 大規模メモリモデル

LM2: Large Memory Models

要旨

Support