ChatPaper.aiChatPaper

LM2:大記憶體模型

LM2: Large Memory Models

February 9, 2025
作者: Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, Marvin Purtorab, Andy Toulis
cs.AI

摘要

本文介紹了大記憶體模型(LM2),這是一種僅包含解碼器的Transformer架構,配備了一個輔助記憶模組,旨在解決標準Transformer在多步推理、關係論證和合成分佈在長範圍內的信息方面的局限性。所提出的LM2融入了一個作為上下文表示庫的記憶模組,通過交叉注意力與輸入標記進行交互作用,並通過閘控機制進行更新。為了保留Transformer的通用能力,LM2在整合一條互補的記憶路徑的同時保持了原始信息流。在BABILong基準測試中的實驗結果顯示,LM2模型在各任務上平均優於記憶增強的RMT模型37.1%和基準Llama-3.2模型86.3%。LM2在多跳推理、數值推理和大範圍問答方面展現出卓越的能力。在MMLU數據集上,它比預訓練的普通模型提高了5.0%,表明其記憶模組不會降低通用任務的性能。此外,在我們的分析中,我們探討了記憶可解釋性、記憶模組的有效性和測試時行為。我們的研究結果強調了明確記憶在增強Transformer架構中的重要性。
English
This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.
PDF307February 11, 2025