锚定解码:为任意语言模型提供可证明的版权风险降低方案
Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
February 6, 2026
作者: Jacqueline He, Jonathan Hayase, Wen-tau Yih, Sewoong Oh, Luke Zettlemoyer, Pang Wei Koh
cs.AI
摘要
现代语言模型(LM)往往会对训练数据进行记忆并逐字输出片段。当底层数据源涉及敏感内容或受版权保护时,此类复现行为会引发创作者授权与补偿问题,并为开发者带来合规风险。我们提出锚定解码(Anchored Decoding)——一种即插即用的推理阶段抑制逐字复制的方法:通过将生成内容约束在经宽松许可训练的安全LM的邻近范围内,该方法可实现基于混合许可数据训练的任何风险LM的安全解码。锚定解码在生成轨迹上自适应分配用户设定的信息预算,并通过每步约束实现序列级保证,从而达成可调控的风险-效用平衡。为使锚定解码具备实用价值,我们新训练了采用宽松许可的安全模型TinyComma(1.8B参数),并推出锚定字节解码(Anchored_{Byte} Decoding)——通过ByteSampler框架(Hayase等,2025)实现跨词表融合的字节级变体。我们在六组模型对上针对版权风险与效用进行长文本评估,结果显示锚定解码与锚定字节解码定义了新的帕累托前沿:在保持接近原始流畅度与事实准确性的同时,以可接受的推理开销将风险基线模型与安全参考模型之间的可测量复制差距(基于六项复制指标平均)最高降低75%。
English
Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored_{Byte} Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored_{Byte} Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.