TPTT: 事前学習済みTransformerをTitansへと変革する

要旨

大規模言語モデル（LLMs）の最近の進展は、自然言語処理において目覚ましい進歩をもたらしているが、その計算量とメモリ要求は、特に長文脈推論において依然として大きな課題となっている。本論文では、事前学習済みTransformerモデルを効率的な線形化注意機構と高度なメモリ管理によって強化する新たなフレームワーク、TPTT（Transforming Pretrained Transformer into Titans）を提案する。TPTTは、Memory as Gate（MaG）や混合線形化注意（LiZA）などの技術を採用している。また、Hugging Face Transformersライブラリと完全に互換性があり、パラメータ効率的なファインチューニング（LoRA）を通じて、完全な再学習なしに任意の因果的LLMをシームレスに適応させることが可能である。約10億パラメータのモデルを用いたMMLUベンチマークにおいて、TPTTの有効性を示し、効率性と精度の両面で大幅な改善を観察した。例えば、Titans-Llama-3.2-1Bは、ベースラインと比較してExact Match（EM）が20％向上した。統計分析と最新の最先端手法との比較により、TPTTの実用的な拡張性と堅牢性が確認された。コードはhttps://github.com/fabienfrfr/tpttで、Pythonパッケージはhttps://pypi.org/project/tptt/で公開されている。

English

Recent advances in large language models (LLMs) have led to remarkable progress in natural language processing, but their computational and memory demands remain a significant challenge, particularly for long-context inference. We introduce TPTT (Transforming Pretrained Transformer into Titans), a novel framework for enhancing pretrained Transformer models with efficient linearized attention mechanisms and advanced memory management. TPTT employs techniques such as Memory as Gate (MaG) and mixed linearized attention (LiZA). It is fully compatible with the Hugging Face Transformers library, enabling seamless adaptation of any causal LLM through parameter-efficient fine-tuning (LoRA) without full retraining. We show the effectiveness of TPTT on the MMLU benchmark with models of approximately 1 billion parameters, observing substantial improvements in both efficiency and accuracy. For instance, Titans-Llama-3.2-1B achieves a 20% increase in Exact Match (EM) over its baseline. Statistical analyses and comparisons with recent state-of-the-art methods confirm the practical scalability and robustness of TPTT. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .

TPTT: 事前学習済みTransformerをTitansへと変革する

TPTT: Transforming Pretrained Transformer into Titans

要旨

Support