背景引用:将模型生成归因于上下文
ContextCite: Attributing Model Generation to Context
September 1, 2024
作者: Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry
cs.AI
摘要
语言模型在生成响应时如何利用提供的上下文信息?我们能否推断特定生成的语句实际上是基于上下文、误解还是虚构的?为了帮助回答这些问题,我们引入了上下文归因的问题:确定上下文中的哪些部分(如果有的话)导致模型生成特定语句。然后,我们提出了ContextCite,这是一种简单且可扩展的方法,用于在任何现有语言模型之上应用上下文归因。最后,我们通过三个应用展示了ContextCite 的实用性:(1)帮助验证生成的语句(2)通过修剪上下文来提高响应质量(3)检测毒化攻击。我们在 https://github.com/MadryLab/context-cite 提供了 ContextCite 的代码。
English
How do language models use information provided as context when generating a
response? Can we infer whether a particular generated statement is actually
grounded in the context, a misinterpretation, or fabricated? To help answer
these questions, we introduce the problem of context attribution: pinpointing
the parts of the context (if any) that led a model to generate a particular
statement. We then present ContextCite, a simple and scalable method for
context attribution that can be applied on top of any existing language model.
Finally, we showcase the utility of ContextCite through three applications: (1)
helping verify generated statements (2) improving response quality by pruning
the context and (3) detecting poisoning attacks. We provide code for
ContextCite at https://github.com/MadryLab/context-cite.Summary
AI-Generated Summary