泛化还是记忆:模式导向的动态解码策略
Generalization or Memorization: Dynamic Decoding for Mode Steering
October 25, 2025
作者: Xuanming Zhang
cs.AI
摘要
大型语言模型(LLMs)展现出令人担忧的双重性:既能实现卓越的泛化能力,又可能对其训练数据产生脆弱的机械记忆。这种不可预测性削弱了其在高风险应用中的可靠性。本研究提出统一框架来理解、识别和控制这两种不同的推理模式。首先,我们基于信息瓶颈原理构建理论模型,将泛化形式化为对压缩化任务相关表征的学习,而将记忆视为压缩失败的表现。基于该理论,我们开发了动态模式导向(DMS)这一新型推理时算法,包含两个核心组件:(1)基于因果关系的轻量级线性探针,用于实时识别模型对记忆机制的瞬时依赖;(2)动态激活导向机制,将模型计算过程引导至预定义的泛化电路。我们将DMS框架定义为一种自适应自对比解码机制。在推理任务和真实性任务上的实验表明,DMS能显著提升逻辑一致性与事实准确性,为增强LLM可靠性提供了原理性解决方案。
English
Large Language Models (LLMs) exhibit a troubling duality, capable of both
remarkable generalization and brittle, verbatim memorization of their training
data. This unpredictability undermines their reliability in high-stakes
applications. In this work, we propose a unified framework to understand,
identify, and control these distinct reasoning modes. First, we introduce a
theoretical model based on the Information Bottleneck (IB) principle,
formalizing generalization as the learning of a compressed, task-relevant
representation and memorization as a failure to compress. Building on this
theory, we develop Dynamic Mode Steering (DMS), a novel inference-time
algorithm which comprises two components: (1) a lightweight, causally-grounded
linear probe that identifies the model's instantaneous reliance on
memorization, and (2) a dynamic activation steering mechanism that nudges the
model's computation towards pre-identified generalization circuits. We frame
DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning
and faithfulness tasks demonstrate that DMS significantly improves logical
consistency and factual accuracy, thereby offering a principled approach to
enhancing LLM reliability.