LongCite:讓LLMs在長文本問答中生成細緻引用
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
September 4, 2024
作者: jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li
cs.AI
摘要
儘管目前的長文本大型語言模型(LLMs)在回答基於廣泛文本的使用者問題方面展現出令人印象深刻的能力,但由於其回應中缺乏引文,使得使用者驗證變得困難,引發對其可信度的擔憂,因為可能出現幻覺。在這項工作中,我們旨在讓長文本LLMs能夠生成具有細粒度句級引文的回應,從而提高其忠實度和可驗證性。我們首先介紹了LongBench-Cite,這是一個自動化基準測試,用於評估目前LLMs在帶有引文的長文本問答(LQAC)中的表現,揭示了有待改進的相當大空間。為此,我們提出了CoF(Coarse to Fine),這是一個新穎的流程,利用現成的LLMs自動生成具有精確句級引文的長文本問答實例,並利用這個流程構建了LongCite-45k,一個用於LQAC的大規模SFT數據集。最後,我們使用LongCite-45k數據集訓練了LongCite-8B和LongCite-9B,成功使它們能夠在單一輸出中生成準確的回應和細粒度句級引文。在LongBench-Cite上的評估結果顯示,我們訓練的模型實現了最先進的引文質量,超越了包括GPT-4o在內的先進專有模型。
English
Though current long-context large language models (LLMs) have demonstrated
impressive capacities in answering user questions based on extensive text, the
lack of citations in their responses makes user verification difficult, leading
to concerns about their trustworthiness due to their potential hallucinations.
In this work, we aim to enable long-context LLMs to generate responses with
fine-grained sentence-level citations, improving their faithfulness and
verifiability. We first introduce LongBench-Cite, an automated benchmark for
assessing current LLMs' performance in Long-Context Question Answering with
Citations (LQAC), revealing considerable room for improvement. To this end, we
propose CoF (Coarse to Fine), a novel pipeline that utilizes off-the-shelf LLMs
to automatically generate long-context QA instances with precise sentence-level
citations, and leverage this pipeline to construct LongCite-45k, a large-scale
SFT dataset for LQAC. Finally, we train LongCite-8B and LongCite-9B using the
LongCite-45k dataset, successfully enabling their generation of accurate
responses and fine-grained sentence-level citations in a single output. The
evaluation results on LongBench-Cite show that our trained models achieve
state-of-the-art citation quality, surpassing advanced proprietary models
including GPT-4o.Summary
AI-Generated Summary