背景引用：將模型生成歸因於上下文

摘要

語言模型在生成回應時如何利用提供的上下文資訊？我們能否推斷特定生成的陳述是否實際根據上下文，是誤解還是捏造的？為了幫助回答這些問題，我們引入了上下文歸因的問題：找出導致模型生成特定陳述的上下文部分（如果有的話）。然後，我們提出了ContextCite，一種簡單且可擴展的方法，用於上下文歸因，可應用於任何現有的語言模型之上。最後，我們通過三個應用展示了ContextCite 的效用：（1）幫助驗證生成的陳述、（2）通過修剪上下文來改善回應質量、（3）檢測攻擊。我們在 https://github.com/MadryLab/context-cite 提供了 ContextCite 的程式碼。

English

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

背景引用：將模型生成歸因於上下文

ContextCite: Attributing Model Generation to Context

摘要

Support