Zebra:使用逐层分组的局部-全局注意力扩展上下文窗口
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
December 14, 2023
作者: Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu
cs.AI
摘要
本文介绍了一种新颖的方法,用于增强大型语言模型(LLMs)在处理和理解大量文本序列方面的能力,这在需要深度理解和综合大量信息的应用中至关重要。鉴于在Transformer架构上构建的LLMs在扩展上下文窗口方面存在的挑战,我们提出了一种新的模型架构,称为Zebra。该架构通过采用分组局部-全局注意力层,有效地管理了Transformer中完全注意力所带来的二次时间和内存复杂性问题。我们的模型类似于斑马的交替条纹,平衡了局部和全局注意力层,显著降低了计算需求和内存消耗。我们进行了全面的实验,包括从头开始的预训练、长上下文适应训练的延续以及长指令调整,以评估Zebra的性能。结果表明,Zebra在短序列和长序列基准上实现了可比或更优越的性能,同时提高了训练和推断效率。
English
This paper introduces a novel approach to enhance the capabilities of Large
Language Models (LLMs) in processing and understanding extensive text
sequences, a critical aspect in applications requiring deep comprehension and
synthesis of large volumes of information. Recognizing the inherent challenges
in extending the context window for LLMs, primarily built on Transformer
architecture, we propose a new model architecture, referred to as Zebra. This
architecture efficiently manages the quadratic time and memory complexity
issues associated with full attention in the Transformer by employing grouped
local-global attention layers. Our model, akin to a zebra's alternating
stripes, balances local and global attention layers, significantly reducing
computational requirements and memory consumption. Comprehensive experiments,
including pretraining from scratch, continuation of long context adaptation
training, and long instruction tuning, are conducted to evaluate the Zebra's
performance. The results show that Zebra achieves comparable or superior
performance on both short and long sequence benchmarks, while also enhancing
training and inference efficiency.