背景引用:將模型生成歸因於上下文
ContextCite: Attributing Model Generation to Context
September 1, 2024
作者: Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry
cs.AI
摘要
語言模型在生成回應時如何利用提供的上下文資訊?我們能否推斷特定生成的陳述是否實際根據上下文,是誤解還是捏造的?為了幫助回答這些問題,我們引入了上下文歸因的問題:找出導致模型生成特定陳述的上下文部分(如果有的話)。然後,我們提出了ContextCite,一種簡單且可擴展的方法,用於上下文歸因,可應用於任何現有的語言模型之上。最後,我們通過三個應用展示了ContextCite 的效用:(1)幫助驗證生成的陳述、(2)通過修剪上下文來改善回應質量、(3)檢測攻擊。我們在 https://github.com/MadryLab/context-cite 提供了 ContextCite 的程式碼。
English
How do language models use information provided as context when generating a
response? Can we infer whether a particular generated statement is actually
grounded in the context, a misinterpretation, or fabricated? To help answer
these questions, we introduce the problem of context attribution: pinpointing
the parts of the context (if any) that led a model to generate a particular
statement. We then present ContextCite, a simple and scalable method for
context attribution that can be applied on top of any existing language model.
Finally, we showcase the utility of ContextCite through three applications: (1)
helping verify generated statements (2) improving response quality by pruning
the context and (3) detecting poisoning attacks. We provide code for
ContextCite at https://github.com/MadryLab/context-cite.Summary
AI-Generated Summary