ChatPaper.aiChatPaper

背景引用:將模型生成歸因於上下文

ContextCite: Attributing Model Generation to Context

September 1, 2024
作者: Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry
cs.AI

摘要

語言模型在生成回應時如何利用提供的上下文資訊?我們能否推斷特定生成的陳述是否實際根據上下文,是誤解還是捏造的?為了幫助回答這些問題,我們引入了上下文歸因的問題:找出導致模型生成特定陳述的上下文部分(如果有的話)。然後,我們提出了ContextCite,一種簡單且可擴展的方法,用於上下文歸因,可應用於任何現有的語言模型之上。最後,我們通過三個應用展示了ContextCite 的效用:(1)幫助驗證生成的陳述、(2)通過修剪上下文來改善回應質量、(3)檢測攻擊。我們在 https://github.com/MadryLab/context-cite 提供了 ContextCite 的程式碼。
English
How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Summary

AI-Generated Summary

PDF143November 16, 2024