长上下文语言建模综合研究
A Comprehensive Survey on Long Context Language Modeling
March 20, 2025
作者: Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li, Tianyu Liu, Fanyu Meng, Wenbo Su, Yingshui Tan, Zili Wang, Jian Yang, Wei Ye, Bo Zheng, Wangchunshu Zhou, Wenhao Huang, Sujian Li, Zhaoxiang Zhang
cs.AI
摘要
在自然语言处理领域,高效处理长文本一直是持续追求的目标。随着长文档、对话及其他文本数据数量的不断增长,开发能够有效且高效处理和分析大规模输入的长上下文语言模型(Long Context Language Models, LCLMs)显得尤为重要。本文全面综述了大型语言模型在长上下文建模方面的最新进展。我们的综述围绕三个关键方面展开:如何获得高效且有效的LCLMs、如何高效训练与部署LCLMs,以及如何全面评估与分析LCLMs。针对第一方面,我们探讨了面向长上下文处理的数据策略、架构设计及工作流程方法。在第二方面,我们详细剖析了LCLM训练与推理所需的基础设施。第三方面,我们介绍了长上下文理解与长文本生成的评估范式,以及LCLMs的行为分析与机制可解释性。除上述三个核心方面外,我们还深入探讨了现有LCLMs已部署的多样化应用场景,并勾勒了未来发展的光明方向。本综述旨在为长上下文大语言模型的研究文献提供最新回顾,期望成为研究人员与工程师的宝贵资源。相关的GitHub仓库,汇集了最新论文与代码库,可通过以下链接访问:
https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}。
English
Efficient processing of long contexts has been a persistent pursuit in
Natural Language Processing. With the growing number of long documents,
dialogues, and other textual data, it is important to develop Long Context
Language Models (LCLMs) that can process and analyze extensive inputs in an
effective and efficient way. In this paper, we present a comprehensive survey
on recent advances in long-context modeling for large language models. Our
survey is structured around three key aspects: how to obtain effective and
efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate
and analyze LCLMs comprehensively. For the first aspect, we discuss data
strategies, architectural designs, and workflow approaches oriented with long
context processing. For the second aspect, we provide a detailed examination of
the infrastructure required for LCLM training and inference. For the third
aspect, we present evaluation paradigms for long-context comprehension and
long-form generation, as well as behavioral analysis and mechanism
interpretability of LCLMs. Beyond these three key aspects, we thoroughly
explore the diverse application scenarios where existing LCLMs have been
deployed and outline promising future development directions. This survey
provides an up-to-date review of the literature on long-context LLMs, which we
wish to serve as a valuable resource for both researchers and engineers. An
associated GitHub repository collecting the latest papers and repos is
available at:
https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.