LongAgent：透過多智能體協作將語言模型擴展至128k上下文

摘要

大型語言模型（LLMs）展現了在理解語言和執行複雜推理任務方面令人印象深刻的表現。然而，具有長上下文窗口的LLMs以其昂貴的訓練成本和高推理延遲而聞名。即使是最先進的模型，如GPT-4和Claude2，在處理超過100k標記的輸入時也常常出現錯誤，這稱為中間迷失的現象。在本文中，我們提出了LongAgent，一種基於多智能體協作的方法，將LLMs（例如LLaMA）擴展到128K上下文，並展示了在長文本處理方面相對於GPT-4的潛在優勢。在LongAgent中，一位領導者負責理解用戶意圖並指導團隊成員從文件中獲取信息。由於成員的幻覺，領導者從數十到數百名成員的回應中獲取準確信息並不是一件簡單的事情。為了解決這個問題，我們開發了一種成員間通信機制，通過信息共享來解決由幻覺引起的回應衝突。我們的實驗結果表明，LongAgent為長文本處理提供了一個有前途的替代方案。使用LLaMA-7B實例化的智能體團隊在128k長文本檢索、多跳問答等任務中相對於GPT-4實現了顯著改進。

English

Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over 100k tokens, a phenomenon also known as lost in the middle. In this paper, we propose LongAgent, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In LongAgent, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an inter-member communication mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that LongAgent offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

LongAgent：透過多智能體協作將語言模型擴展至128k上下文

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

摘要

Support