Zebra：使用逐层分组的局部-全局注意力扩展上下文窗口

摘要

本文介绍了一种新颖的方法，用于增强大型语言模型（LLMs）在处理和理解大量文本序列方面的能力，这在需要深度理解和综合大量信息的应用中至关重要。鉴于在Transformer架构上构建的LLMs在扩展上下文窗口方面存在的挑战，我们提出了一种新的模型架构，称为Zebra。该架构通过采用分组局部-全局注意力层，有效地管理了Transformer中完全注意力所带来的二次时间和内存复杂性问题。我们的模型类似于斑马的交替条纹，平衡了局部和全局注意力层，显著降低了计算需求和内存消耗。我们进行了全面的实验，包括从头开始的预训练、长上下文适应训练的延续以及长指令调整，以评估Zebra的性能。结果表明，Zebra在短序列和长序列基准上实现了可比或更优越的性能，同时提高了训练和推断效率。

English

This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose a new model architecture, referred to as Zebra. This architecture efficiently manages the quadratic time and memory complexity issues associated with full attention in the Transformer by employing grouped local-global attention layers. Our model, akin to a zebra's alternating stripes, balances local and global attention layers, significantly reducing computational requirements and memory consumption. Comprehensive experiments, including pretraining from scratch, continuation of long context adaptation training, and long instruction tuning, are conducted to evaluate the Zebra's performance. The results show that Zebra achieves comparable or superior performance on both short and long sequence benchmarks, while also enhancing training and inference efficiency.

Zebra：使用逐层分组的局部-全局注意力扩展上下文窗口

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

摘要

Support