TPTT: Transformando Transformers Pré-treinados em Titãs

Resumo

Os recentes avanços em modelos de linguagem de grande escala (LLMs) levaram a progressos notáveis no processamento de linguagem natural, mas suas demandas computacionais e de memória continuam sendo um desafio significativo, especialmente para inferência de contexto longo. Apresentamos o TPTT (Transforming Pretrained Transformer into Titans), uma nova estrutura para aprimorar modelos Transformer pré-treinados com mecanismos de atenção linearizada eficientes e gerenciamento avançado de memória. O TPTT emprega técnicas como Memory as Gate (MaG) e atenção linearizada mista (LiZA). Ele é totalmente compatível com a biblioteca Hugging Face Transformers, permitindo a adaptação contínua de qualquer LLM causal por meio de ajuste fino eficiente em parâmetros (LoRA) sem a necessidade de retreinamento completo. Demonstramos a eficácia do TPTT no benchmark MMLU com modelos de aproximadamente 1 bilhão de parâmetros, observando melhorias substanciais tanto em eficiência quanto em precisão. Por exemplo, o Titans-Llama-3.2-1B alcança um aumento de 20% no Exact Match (EM) em relação à sua linha de base. Análises estatísticas e comparações com métodos recentes de ponta confirmam a escalabilidade prática e a robustez do TPTT. O código está disponível em https://github.com/fabienfrfr/tptt. O pacote Python está disponível em https://pypi.org/project/tptt/.

English

Recent advances in large language models (LLMs) have led to remarkable progress in natural language processing, but their computational and memory demands remain a significant challenge, particularly for long-context inference. We introduce TPTT (Transforming Pretrained Transformer into Titans), a novel framework for enhancing pretrained Transformer models with efficient linearized attention mechanisms and advanced memory management. TPTT employs techniques such as Memory as Gate (MaG) and mixed linearized attention (LiZA). It is fully compatible with the Hugging Face Transformers library, enabling seamless adaptation of any causal LLM through parameter-efficient fine-tuning (LoRA) without full retraining. We show the effectiveness of TPTT on the MMLU benchmark with models of approximately 1 billion parameters, observing substantial improvements in both efficiency and accuracy. For instance, Titans-Llama-3.2-1B achieves a 20% increase in Exact Match (EM) over its baseline. Statistical analyses and comparisons with recent state-of-the-art methods confirm the practical scalability and robustness of TPTT. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .

TPTT: Transformando Transformers Pré-treinados em Titãs

TPTT: Transforming Pretrained Transformer into Titans

Resumo

Support