ChatQA: GPT-4レベルの対話型質問応答モデルの構築

要旨

本研究では、GPT-4レベルの精度を達成する会話型質問応答（QA）モデル群であるChatQAを紹介します。具体的には、大規模言語モデル（LLM）のゼロショット会話型QAの結果を大幅に改善できる2段階の指示チューニング手法を提案します。会話型QAにおける検索処理に対応するため、マルチターンQAデータセットで高密度検索モデルをファインチューニングし、最先端のクエリ書き換えモデルを使用する場合と同等の結果を得ながら、展開コストを大幅に削減しました。特に、OpenAI GPTモデルからの合成データに依存することなく、ChatQA-70Bは10の会話型QAデータセットにおける平均スコア（54.14対53.90）でGPT-4を上回る性能を示しています。

English

In this work, we introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

ChatQA: GPT-4レベルの対話型質問応答モデルの構築

ChatQA: Building GPT-4 Level Conversational QA Models

要旨

Support