如是说长上下文大语言模型

摘要

長上下文是自然語言處理（NLP）中的一個重要主題，貫穿於NLP架構的發展歷程，並為大型語言模型（LLMs）提供了巨大的機會，使其具備類似人類的終身學習潛力。然而，追求長上下文伴隨著諸多挑戰。儘管如此，長上下文仍然是LLMs的核心競爭優勢。在過去兩年中，LLMs的上下文長度已實現了突破性擴展，達到數百萬個token。此外，長上下文LLMs的研究已從長度外推擴展到對架構、基礎設施、訓練和評估技術的全面關注。受交響詩《查拉圖斯特拉如是說》的啟發，我們將LLM擴展上下文的旅程與人類超越自身有限性的嘗試進行類比。在本調查中，我們將闡述LLM如何在對更長上下文的巨大需求與接受其終究有限的事實之間掙扎。為此，我們從架構、基礎設施、訓練和評估四個角度，全面描繪長上下文LLMs的生命週期，展示長上下文技術的全貌。在本調查的結尾，我們將提出目前長上下文LLMs面臨的十個未解之題。我們希望這份調查能作為長上下文LLMs研究的系統性介紹。

English

Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies. Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, We will illustrate how LLM struggles between the tremendous need for a longer context and its equal need to accept the fact that it is ultimately finite. To achieve this, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we will present 10 unanswered questions currently faced by long-context LLMs. We hope this survey can serve as a systematic introduction to the research on long-context LLMs.

如是说长上下文大语言模型

Thus Spake Long-Context Large Language Model

摘要

Support