TPTT：将预训练Transformer模型转化为泰坦级架构

摘要

近期，大型语言模型（LLMs）的突破性进展显著推动了自然语言处理领域的发展，然而其计算与内存需求，尤其是在长上下文推理场景下，仍构成重大挑战。为此，我们提出了TPTT（将预训练Transformer转化为泰坦）这一创新框架，旨在通过高效的线性化注意力机制与先进的内存管理策略，增强预训练Transformer模型的性能。TPTT采用了诸如“记忆作为门控”（MaG）和混合线性化注意力（LiZA）等技术，并完全兼容Hugging Face Transformers库，使得任何因果LLM都能通过参数高效微调（LoRA）实现无缝适配，无需全面重训练。我们在MMLU基准测试中，对约10亿参数的模型验证了TPTT的有效性，观察到效率与准确率均有显著提升。例如，Titans-Llama-3.2-1B在精确匹配（EM）指标上较基线提升了20%。统计分析及与最新顶尖方法的对比，进一步证实了TPTT在实际应用中的可扩展性与鲁棒性。代码公开于https://github.com/fabienfrfr/tptt，Python包可在https://pypi.org/project/tptt/获取。

English

Recent advances in large language models (LLMs) have led to remarkable progress in natural language processing, but their computational and memory demands remain a significant challenge, particularly for long-context inference. We introduce TPTT (Transforming Pretrained Transformer into Titans), a novel framework for enhancing pretrained Transformer models with efficient linearized attention mechanisms and advanced memory management. TPTT employs techniques such as Memory as Gate (MaG) and mixed linearized attention (LiZA). It is fully compatible with the Hugging Face Transformers library, enabling seamless adaptation of any causal LLM through parameter-efficient fine-tuning (LoRA) without full retraining. We show the effectiveness of TPTT on the MMLU benchmark with models of approximately 1 billion parameters, observing substantial improvements in both efficiency and accuracy. For instance, Titans-Llama-3.2-1B achieves a 20% increase in Exact Match (EM) over its baseline. Statistical analyses and comparisons with recent state-of-the-art methods confirm the practical scalability and robustness of TPTT. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .

TPTT：将预训练Transformer模型转化为泰坦级架构

TPTT: Transforming Pretrained Transformer into Titans

摘要

Support