長上下文語言建模全面綜述

摘要

在自然語言處理領域，高效處理長上下文一直是一個持續追求的目標。隨著長文檔、對話和其他文本數據的數量不斷增長，開發能夠有效且高效處理和分析大量輸入的長上下文語言模型（LCLMs）變得尤為重要。本文全面綜述了大語言模型在長上下文建模方面的最新進展。我們的綜述圍繞三個關鍵方面展開：如何獲得有效且高效的LCLMs、如何高效地訓練和部署LCLMs，以及如何全面評估和分析LCLMs。對於第一個方面，我們討論了面向長上下文處理的數據策略、架構設計和工作流程方法。對於第二個方面，我們詳細檢視了LCLM訓練和推理所需的基礎設施。對於第三個方面，我們提出了長上下文理解和長文本生成的評估範式，以及LCLMs的行為分析和機制可解釋性。除了這三個關鍵方面，我們還深入探討了現有LCLMs已部署的多樣化應用場景，並勾勒了未來發展的潛在方向。本綜述提供了關於長上下文大語言模型的最新文獻回顧，我們希望這能成為研究人員和工程師的寶貴資源。相關的GitHub倉庫收集了最新的論文和代碼庫，可訪問： https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}。

English

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

長上下文語言建模全面綜述

A Comprehensive Survey on Long Context Language Modeling

摘要

Support