TPTT：將預訓練Transformer轉化為泰坦

摘要

近期，大型语言模型（LLMs）的进展在自然语言处理领域取得了显著成就，然而其计算与内存需求仍构成重大挑战，尤其是在长上下文推理方面。我们提出了TPTT（将预训练Transformer转化为泰坦），一种新颖的框架，旨在通过高效的线性化注意力机制与先进的内存管理技术来增强预训练Transformer模型。TPTT采用了诸如“记忆作为门”（MaG）和混合线性化注意力（LiZA）等技术。该框架完全兼容Hugging Face Transformers库，使得任何因果LLM都能通过参数高效微调（LoRA）实现无缝适应，而无需全面重新训练。我们在MMLU基准测试中，以约10亿参数的模型展示了TPTT的有效性，观察到效率与准确性的显著提升。例如，Titans-Llama-3.2-1B在精确匹配（EM）上较其基线提升了20%。统计分析及与近期最先进方法的比较，证实了TPTT在实际可扩展性与鲁棒性方面的优势。代码可在https://github.com/fabienfrfr/tptt获取，Python包则发布于https://pypi.org/project/tptt/。

English

Recent advances in large language models (LLMs) have led to remarkable progress in natural language processing, but their computational and memory demands remain a significant challenge, particularly for long-context inference. We introduce TPTT (Transforming Pretrained Transformer into Titans), a novel framework for enhancing pretrained Transformer models with efficient linearized attention mechanisms and advanced memory management. TPTT employs techniques such as Memory as Gate (MaG) and mixed linearized attention (LiZA). It is fully compatible with the Hugging Face Transformers library, enabling seamless adaptation of any causal LLM through parameter-efficient fine-tuning (LoRA) without full retraining. We show the effectiveness of TPTT on the MMLU benchmark with models of approximately 1 billion parameters, observing substantial improvements in both efficiency and accuracy. For instance, Titans-Llama-3.2-1B achieves a 20% increase in Exact Match (EM) over its baseline. Statistical analyses and comparisons with recent state-of-the-art methods confirm the practical scalability and robustness of TPTT. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .

TPTT：將預訓練Transformer轉化為泰坦

TPTT: Transforming Pretrained Transformer into Titans

摘要

Support