长代理：通过多智能体协作将语言模型扩展到128k上下文

摘要

大型语言模型（LLMs）展示了在理解语言和执行复杂推理任务方面的出色性能。然而，具有长上下文窗口的LLMs以其昂贵的训练成本和高推理延迟而臭名昭著。即使是最先进的模型，如GPT-4和Claude2，在处理超过100k标记的输入时也经常出现错误，这种现象也被称为中间迷失。在本文中，我们提出了LongAgent，这是一种基于多智能体协作的方法，可以将LLMs（例如LLaMA）扩展到128K的上下文，并在长文本处理方面展现出潜在的优势，与GPT-4相比。在LongAgent中，一位领导者负责理解用户意图并指导团队成员从文档中获取信息。由于成员的幻觉，领导者很难从数十到数百名成员的回应中获取准确信息。为了解决这个问题，我们开发了一种成员间通信机制，通过信息共享来解决由幻觉引起的回应冲突。我们的实验结果表明，LongAgent为长文本处理提供了一个有前途的替代方案。使用LLaMA-7B实例化的智能体团队在诸如128k长文本检索、多跳问题回答等任务中相比GPT-4取得了显著的改进。

English

Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over 100k tokens, a phenomenon also known as lost in the middle. In this paper, we propose LongAgent, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In LongAgent, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an inter-member communication mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that LongAgent offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

长代理：通过多智能体协作将语言模型扩展到128k上下文

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

摘要

Support