ChatQA：构建GPT-4级的会话问答模型

摘要

在这项工作中，我们介绍了ChatQA，这是一系列会获得GPT-4级别准确性的对话问答（QA）模型。具体而言，我们提出了一种两阶段指导调整方法，可以显著提高大型语言模型（LLMs）的零样本对话问答结果。为了处理对话问答中的检索，我们在多轮QA数据集上对密集的检索器进行微调，这提供了与使用最先进的查询重写模型相当的结果，同时大幅降低了部署成本。值得注意的是，我们的ChatQA-70B在10个对话问答数据集的平均分上可以胜过GPT-4（54.14比53.90），而且不依赖于OpenAI GPT模型的任何合成数据。

English

In this work, we introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

ChatQA：构建GPT-4级的会话问答模型

ChatQA: Building GPT-4 Level Conversational QA Models

摘要

Support