遞歸語言模型

摘要

我們從推理時間縮放的角度，研究如何讓大型語言模型（LLMs）處理任意長度的提示。我們提出遞歸語言模型（RLMs），這是一種通用推理策略，將長提示視為外部環境的一部分，允許LLM以程式化方式檢查、分解提示片段，並對其進行遞歸自我調用。研究發現，RLMs能成功處理超出模型上下文窗口兩個數量級的輸入，且即使在較短提示上，於四項多樣化的長上下文任務中，其表現質量也顯著超越基礎LLMs和常見的長上下文框架，同時保持可比（或更低）的單次查詢成本。

English

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.