ChatPaper.aiChatPaper

ChatQA 2:在长文本和RAG能力中弥合与专有LLMs之间的差距

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

July 19, 2024
作者: Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro
cs.AI

摘要

在这项工作中,我们介绍了ChatQA 2,这是一个基于Llama3的模型,旨在弥合开放获取的LLMs和领先的专有模型(例如GPT-4-Turbo)在长上下文理解和检索增强生成(RAG)能力方面的差距。这两种能力对于LLMs处理无法适应单个提示的大量信息至关重要,并且相互补充,取决于下游任务和计算预算。我们提出了一个详细的持续训练配方,将Llama3-70B-base的上下文窗口从8K扩展到128K标记,以及一个三阶段指令调优过程,以增强模型的指令遵循、RAG性能和长上下文理解能力。我们的结果表明,Llama3-ChatQA-2-70B模型在许多长上下文理解任务上实现了与GPT-4-Turbo-2024-0409相媲美的准确性,并在RAG基准测试中超越了后者。有趣的是,我们发现最先进的长上下文检索器可以缓解RAG中的top-k上下文碎片化问题,进一步改善了基于RAG的长上下文理解任务的结果。我们还使用最先进的长上下文LLMs对RAG和长上下文解决方案进行了广泛比较。
English
In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities are essential for LLMs to process large volumes of information that cannot fit into a single prompt and are complementary to each other, depending on the downstream tasks and computational budgets. We present a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on many long-context understanding tasks and surpasses it on the RAG benchmark. Interestingly, we find that the state-of-the-art long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks. We also provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Summary

AI-Generated Summary

PDF275November 28, 2024