ChatPaper.aiChatPaper

为何调控有效:语言模型参数动力学的统一视角

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

February 2, 2026
作者: Ziwen Xu, Chenyan Wu, Hengyu Sun, Haiwen Hong, Mengru Wang, Yunzhi Yao, Longtao Huang, Hui Xue, Shumin Deng, Zhixuan Chu, Huajun Chen, Ningyu Zhang
cs.AI

摘要

当前针对大语言模型(LLM)的控制方法——包括局部权重微调、基于LoRA的适配以及基于激活状态的干预——往往被孤立研究,这掩盖了它们之间的内在联系并导致对比困难。本研究提出统一视角,将这类干预措施视为由控制信号引发的动态权重更新,并将其纳入同一概念框架。基于此视角,我们建立了统一偏好-效用分析框架:将控制效果分解为偏好(指向目标概念的倾向性)和效用(保持生成连贯性与任务有效性),并采用极性配对对比样本在共享对数几率尺度上量化二者。所有方法均呈现一致的偏好-效用权衡规律:强化控制会提升偏好,但会可预见地降低效用。我们进一步通过激活流形视角解释该现象:控制操作会沿目标概念方向移动表征以增强偏好,而当干预使表征偏离模型的有效生成流形时,效用则显著下降。最后,基于此分析我们提出新型引导方法SPLIT,在提升偏好的同时更好地保持效用。代码已发布于https://github.com/zjunlp/EasyEdit/blob/main/examples/SPLIT.md。
English
Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring their connections and making comparison difficult. In this work, we present a unified view that frames these interventions as dynamic weight updates induced by a control signal, placing them within a single conceptual framework. Building on this view, we propose a unified preference-utility analysis that separates control effects into preference, defined as the tendency toward a target concept, and utility, defined as coherent and task-valid generation, and measures both on a shared log-odds scale using polarity-paired contrastive examples. Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility. We further explain this behavior through an activation manifold perspective, in which control shifts representations along target-concept directions to enhance preference, while utility declines primarily when interventions push representations off the model's valid-generation manifold. Finally, we introduce a new steering approach SPLIT guided by this analysis that improves preference while better preserving utility. Code is available at https://github.com/zjunlp/EasyEdit/blob/main/examples/SPLIT.md.
PDF133March 12, 2026