长上下文语言建模综合研究

摘要

在自然语言处理领域，高效处理长文本一直是持续追求的目标。随着长文档、对话及其他文本数据数量的不断增长，开发能够有效且高效处理和分析大规模输入的长上下文语言模型（Long Context Language Models, LCLMs）显得尤为重要。本文全面综述了大型语言模型在长上下文建模方面的最新进展。我们的综述围绕三个关键方面展开：如何获得高效且有效的LCLMs、如何高效训练与部署LCLMs，以及如何全面评估与分析LCLMs。针对第一方面，我们探讨了面向长上下文处理的数据策略、架构设计及工作流程方法。在第二方面，我们详细剖析了LCLM训练与推理所需的基础设施。第三方面，我们介绍了长上下文理解与长文本生成的评估范式，以及LCLMs的行为分析与机制可解释性。除上述三个核心方面外，我们还深入探讨了现有LCLMs已部署的多样化应用场景，并勾勒了未来发展的光明方向。本综述旨在为长上下文大语言模型的研究文献提供最新回顾，期望成为研究人员与工程师的宝贵资源。相关的GitHub仓库，汇集了最新论文与代码库，可通过以下链接访问： https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}。

English

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

长上下文语言建模综合研究

A Comprehensive Survey on Long Context Language Modeling

摘要

Support