ChatPaper.aiChatPaper

不留下任何上下文:具有无限注意力的高效无限上下文Transformer。

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

April 10, 2024
作者: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
cs.AI

摘要

本文介绍了一种高效的方法,用于将基于Transformer的大型语言模型(LLMs)扩展到具有有限内存和计算的无限长输入。我们提出的方法的关键组成部分是一种名为无限注意力(Infini-attention)的新注意力技术。无限注意力将一种压缩式记忆引入到基本注意力机制中,并在单个Transformer块中构建了遮罩局部注意力和长期线性注意力机制。我们在长上下文语言建模基准、100万序列长度的密码上下文块检索和50万长度的书籍摘要任务上展示了我们方法的有效性,使用了10亿和80亿的LLMs。我们的方法引入了最小的有限内存参数,并实现了LLMs的快速流推理。
English
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

Summary

AI-Generated Summary

PDF11013December 15, 2024