ChatQA: GPT-4 수준의 대화형 질의응답 모델 구축

초록

본 연구에서는 GPT-4 수준의 정확도를 달성하는 대화형 질의응답(Conversational QA) 모델군인 ChatQA를 소개한다. 구체적으로, 우리는 대형 언어 모델(LLMs)의 제로샷 대화형 질의응답 결과를 크게 개선할 수 있는 2단계 지시 튜닝(instruction tuning) 방법을 제안한다. 대화형 질의응답에서의 검색(retrieval)을 처리하기 위해, 우리는 다중 턴 질의응답 데이터셋에서 밀집 검색기(dense retriever)를 미세 조정(fine-tune)하였으며, 이는 최신 질의 재작성(query rewriting) 모델을 사용하는 것과 비슷한 결과를 제공하면서도 배포 비용을 크게 절감한다. 특히, 우리의 ChatQA-70B는 OpenAI GPT 모델에서 생성된 합성 데이터에 의존하지 않으면서도 10개의 대화형 질의응답 데이터셋에서 평균 점수(54.14 대 53.90) 기준으로 GPT-4를 능가할 수 있다.

English

In this work, we introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

ChatQA: GPT-4 수준의 대화형 질의응답 모델 구축

ChatQA: Building GPT-4 Level Conversational QA Models

초록

Support