不留下任何上下文:具有無限關注的高效無限上下文Transformer。
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
April 10, 2024
作者: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
cs.AI
摘要
本研究介紹了一種有效的方法,用於將基於Transformer的大型語言模型(LLMs)擴展到無限長的輸入,並限制記憶體和計算。我們提出方法的關鍵組成部分是一種名為Infini-attention的新型注意力技術。Infini-attention將一種壓縮記憶體引入到基本注意力機制中,並在單個Transformer塊中結合了遮罩本地注意力和長期線性注意力機制。我們在長文本語言建模基準、100萬序列長度的密碼鎖定內容檢索和50萬長度的書籍摘要任務上展示了我們方法的有效性,使用10億和80億個LLMs。我們的方法引入了最小的有界記憶體參數,並實現了LLMs的快速流式推理。
English
This work introduces an efficient method to scale Transformer-based Large
Language Models (LLMs) to infinitely long inputs with bounded memory and
computation. A key component in our proposed approach is a new attention
technique dubbed Infini-attention. The Infini-attention incorporates a
compressive memory into the vanilla attention mechanism and builds in both
masked local attention and long-term linear attention mechanisms in a single
Transformer block. We demonstrate the effectiveness of our approach on
long-context language modeling benchmarks, 1M sequence length passkey context
block retrieval and 500K length book summarization tasks with 1B and 8B LLMs.
Our approach introduces minimal bounded memory parameters and enables fast
streaming inference for LLMs.Summary
AI-Generated Summary