文档溯源:基于大型语言模型的引用关系研究
Document Attribution: Examining Citation Relationships using Large Language Models
May 9, 2025
作者: Vipula Rawte, Ryan A. Rossi, Franck Dernoncourt, Nedim Lipka
cs.AI
摘要
随着大型语言模型(LLMs)越来越多地应用于文档相关任务——如文档摘要、问答和信息抽取——在这些任务中,用户需求侧重于从提供的文档中检索信息,而非依赖模型的参数化知识,确保这些系统的可信度和可解释性已成为一个关键问题。应对这一挑战的核心方法是归因,即追踪生成输出至其源文档。然而,鉴于LLMs可能产生不准确或不精确的响应,评估这些引用的可靠性至关重要。
为此,我们的研究提出了两种技术。(1) 一种零样本方法,将归因问题简化为直接的文本蕴含任务。我们采用flan-ul2模型的方法,在AttributionBench的ID和OOD数据集上分别比最佳基线提高了0.27%和2.4%。(2) 我们还探讨了注意力机制在增强归因过程中的作用。使用较小的LLM——flan-t5-small,其F1分数在除第4层及第8至11层外的几乎所有层上均超越了基线表现。
English
As Large Language Models (LLMs) are increasingly applied to document-based
tasks - such as document summarization, question answering, and information
extraction - where user requirements focus on retrieving information from
provided documents rather than relying on the model's parametric knowledge,
ensuring the trustworthiness and interpretability of these systems has become a
critical concern. A central approach to addressing this challenge is
attribution, which involves tracing the generated outputs back to their source
documents. However, since LLMs can produce inaccurate or imprecise responses,
it is crucial to assess the reliability of these citations.
To tackle this, our work proposes two techniques. (1) A zero-shot approach
that frames attribution as a straightforward textual entailment task. Our
method using flan-ul2 demonstrates an improvement of 0.27% and 2.4% over the
best baseline of ID and OOD sets of AttributionBench, respectively. (2) We also
explore the role of the attention mechanism in enhancing the attribution
process. Using a smaller LLM, flan-t5-small, the F1 scores outperform the
baseline across almost all layers except layer 4 and layers 8 through 11.Summary
AI-Generated Summary