递归语言模型

摘要

我们从推理时扩展的视角出发，研究如何使大语言模型（LLM）能够处理任意长度的提示文本。我们提出递归语言模型（RLM），这是一种通用推理策略：将长提示文本视为外部环境的一部分，允许LLM以编程方式检查、分解提示片段，并递归调用自身进行处理。研究发现，RLM能成功处理超出模型上下文窗口两个数量级的输入；即使在较短提示场景下，其在四项不同的长上下文任务中也显著优于基础LLM和常见的长上下文框架，同时单次查询成本相当（或更低）。

English

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.