LongCite:使LLMs能够在长文本问答中生成细粒度引文
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
September 4, 2024
作者: jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li
cs.AI
摘要
尽管当前的长文本大型语言模型(LLMs)在基于广泛文本回答用户问题方面展示出令人印象深刻的能力,但其回应缺乏引用使得用户验证变得困难,引发了对其可信度的担忧,因为其潜在的幻觉。在这项工作中,我们旨在使长文本LLMs能够生成带有细粒度句级引用的回应,提高其忠实度和可验证性。我们首先介绍了LongBench-Cite,这是一个用于评估当前LLMs在带引用的长文本问答(LQAC)中表现的自动化基准,揭示了改进的重要空间。为此,我们提出了CoF(Coarse to Fine),这是一个利用现成LLMs自动生成带有精确句级引用的长文本问答实例的新型流程,并利用该流程构建了LongCite-45k,一个用于LQAC的大规模SFT数据集。最后,我们使用LongCite-45k数据集训练了LongCite-8B和LongCite-9B,成功使它们在单个输出中生成准确的回应和细粒度句级引用。在LongBench-Cite上的评估结果显示,我们训练的模型实现了最先进的引用质量,超越了包括GPT-4o在内的先进专有模型。
English
Though current long-context large language models (LLMs) have demonstrated
impressive capacities in answering user questions based on extensive text, the
lack of citations in their responses makes user verification difficult, leading
to concerns about their trustworthiness due to their potential hallucinations.
In this work, we aim to enable long-context LLMs to generate responses with
fine-grained sentence-level citations, improving their faithfulness and
verifiability. We first introduce LongBench-Cite, an automated benchmark for
assessing current LLMs' performance in Long-Context Question Answering with
Citations (LQAC), revealing considerable room for improvement. To this end, we
propose CoF (Coarse to Fine), a novel pipeline that utilizes off-the-shelf LLMs
to automatically generate long-context QA instances with precise sentence-level
citations, and leverage this pipeline to construct LongCite-45k, a large-scale
SFT dataset for LQAC. Finally, we train LongCite-8B and LongCite-9B using the
LongCite-45k dataset, successfully enabling their generation of accurate
responses and fine-grained sentence-level citations in a single output. The
evaluation results on LongBench-Cite show that our trained models achieve
state-of-the-art citation quality, surpassing advanced proprietary models
including GPT-4o.Summary
AI-Generated Summary