ChatPaper.aiChatPaper

锚定解码:为任意语言模型提供可证明的版权风险降低方案

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

February 6, 2026
作者: Jacqueline He, Jonathan Hayase, Wen-tau Yih, Sewoong Oh, Luke Zettlemoyer, Pang Wei Koh
cs.AI

摘要

现代语言模型(LM)往往会在训练数据中记忆部分内容并生成逐字复现的文本片段。当原始数据涉及敏感信息或受版权保护时,这种复现行为会引发创作者授权与补偿问题,并为开发者带来合规风险。我们提出锚定解码(Anchored Decoding)——一种即插即用的推理阶段抑制逐字复制方法:通过将生成内容约束在经宽松许可训练的安全语言模型附近,该方法能实现对基于混合许可数据训练的风险语言模型的安全解码。锚定解码会沿生成轨迹自适应分配用户设定的信息预算,并通过每步约束实现序列级保证,从而形成可调节的风险-效用平衡机制。为提升实用性,我们同步推出了经宽松许可训练的安全模型TinyComma(18亿参数)以及锚定字节解码(Anchored_{Byte} Decoding)——该方法通过字节采样框架(Hayase等人,2025)实现跨词表融合的字节级变体。我们在六组模型对上进行了长文本版权风险与效用评估,结果显示锚定解码与锚定字节解码定义了新的帕累托前沿:在保持接近原始模型的流畅性与事实准确性的同时,能以微小的推理开销将风险基线模型与安全参照模型之间的可测量复制差距(基于六项复制指标平均值)最高削减75%。
English
Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored_{Byte} Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored_{Byte} Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.
PDF12February 11, 2026