ChatQA 2:在長文本和RAG功能中搭建與專有LLMs之間的橋樑
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
July 19, 2024
作者: Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro
cs.AI
摘要
在這份工作中,我們介紹了ChatQA 2,這是一個基於Llama3的模型,旨在彌合開放存取的LLM與領先的專有模型(例如GPT-4-Turbo)在長文本理解和檢索增強生成(RAG)能力方面的差距。這兩種能力對於LLM處理無法容納在單個提示中的大量信息至關重要,並且相互補充,取決於下游任務和計算預算。我們提出了一個詳細的持續訓練配方,將Llama3-70B-base的上下文窗口從8K擴展到128K標記,以及一個三階段指導調整過程,以增強模型的指導遵循、RAG性能和長文本理解能力。我們的結果表明,Llama3-ChatQA-2-70B模型在許多長文本理解任務上實現了與GPT-4-Turbo-2024-0409相當的準確性,並在RAG基準上超越了它。有趣的是,我們發現最先進的長文本檢索器可以緩解RAG中的前k上下文碎片化問題,進一步改善基於RAG的長文本理解任務的結果。我們還使用最先進的長文本LLM對RAG和長文本解決方案進行了廣泛比較。
English
In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge
the gap between open-access LLMs and leading proprietary models (e.g.,
GPT-4-Turbo) in long-context understanding and retrieval-augmented generation
(RAG) capabilities. These two capabilities are essential for LLMs to process
large volumes of information that cannot fit into a single prompt and are
complementary to each other, depending on the downstream tasks and
computational budgets. We present a detailed continued training recipe to
extend the context window of Llama3-70B-base from 8K to 128K tokens, along with
a three-stage instruction tuning process to enhance the model's
instruction-following, RAG performance, and long-context understanding
capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model
achieves accuracy comparable to GPT-4-Turbo-2024-0409 on many long-context
understanding tasks and surpasses it on the RAG benchmark. Interestingly, we
find that the state-of-the-art long-context retriever can alleviate the top-k
context fragmentation issue in RAG, further improving RAG-based results for
long-context understanding tasks. We also provide extensive comparisons between
RAG and long-context solutions using state-of-the-art long-context LLMs.Summary
AI-Generated Summary