ChatPaper.aiChatPaper

泛化抑或记忆:模式导向的动态解码机制

Generalization or Memorization: Dynamic Decoding for Mode Steering

October 25, 2025
作者: Xuanming Zhang
cs.AI

摘要

大型語言模型(LLMs)展現出令人憂慮的雙重性:既能實現卓越的泛化能力,又會對訓練數據產生脆弱且逐字記憶的現象。這種不可預測性削弱了其在高風險應用中的可靠性。本研究提出一個統一框架來理解、識別並控制這兩種不同的推理模式。首先,我們基於信息瓶頸(IB)原理建立理論模型,將泛化形式化定義為對壓縮化任務相關表徵的學習,而將記憶視為壓縮失敗的表現。在此理論基礎上,我們開發了動態模式導向(DMS)——一種新型推理時算法,包含兩個組件:(1)輕量級的因果線性探測器,用於實時識別模型對記憶機制的瞬時依賴;(2)動態激活導向機制,將模型計算過程引導至預先識別的泛化迴路。我們將DMS框架定義為一種自適應的對比解碼形式。在推理任務和真實性任務上的實驗表明,DMS能顯著提升邏輯一致性和事實準確性,從而為增強LLM可靠性提供了理論嚴謹的解決方案。
English
Large Language Models (LLMs) exhibit a troubling duality, capable of both remarkable generalization and brittle, verbatim memorization of their training data. This unpredictability undermines their reliability in high-stakes applications. In this work, we propose a unified framework to understand, identify, and control these distinct reasoning modes. First, we introduce a theoretical model based on the Information Bottleneck (IB) principle, formalizing generalization as the learning of a compressed, task-relevant representation and memorization as a failure to compress. Building on this theory, we develop Dynamic Mode Steering (DMS), a novel inference-time algorithm which comprises two components: (1) a lightweight, causally-grounded linear probe that identifies the model's instantaneous reliance on memorization, and (2) a dynamic activation steering mechanism that nudges the model's computation towards pre-identified generalization circuits. We frame DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning and faithfulness tasks demonstrate that DMS significantly improves logical consistency and factual accuracy, thereby offering a principled approach to enhancing LLM reliability.
PDF31December 1, 2025