大型语言模型的自检与优化中的内部流特征

摘要

大型语言模型生成的流畅回答可能偏离所提供语境，而现有保障机制多依赖生成后的外部验证或独立判别器。我们提出一种基于内部流特征的方法，通过固定块间监测边界上的深度动态来审计决策形成过程。该方法采用偏置中心监测稳定词元级运动，随后在紧凑的移动读取对齐子空间内汇总轨迹——这些子空间由每个深度窗口内的顶级词元及其紧邻竞争者构建。通过正交传输实现相邻窗口帧的对齐，生成可进行深度比较的传输步长、转向角及子空间漂移摘要，这些指标对窗口内基向量选择具有不变性。基于这些特征训练的轻量级GRU验证器可实现不修改基础模型的自我核查。除检测功能外，该验证器能定位问题深度事件并实现靶向优化：模型回滚至问题词元，在识别出的模块处钳制异常传输步长，同时保留正交残差。最终构建的流程可从内部决策动态中提供可操作的定位功能与低开销自我核查。代码详见github.com/EavnJeong/Internal-Flow-Signatures-for-Self-Checking-and-Refinement-in-LLMs。

English

Large language models can generate fluent answers that are unfaithful to the provided context, while many safeguards rely on external verification or a separate judge after generation. We introduce internal flow signatures that audit decision formation from depthwise dynamics at a fixed inter-block monitoring boundary. The method stabilizes token-wise motion via bias-centered monitoring, then summarizes trajectories in compact moving readout-aligned subspaces constructed from the top token and its close competitors within each depth window. Neighboring window frames are aligned by an orthogonal transport, yielding depth-comparable transported step lengths, turning angles, and subspace drift summaries that are invariant to within-window basis choices. A lightweight GRU validator trained on these signatures performs self-checking without modifying the base model. Beyond detection, the validator localizes a culprit depth event and enables a targeted refinement: the model rolls back to the culprit token and clamps an abnormal transported step at the identified block while preserving the orthogonal residual. The resulting pipeline provides actionable localization and low-overhead self-checking from internal decision dynamics. Code is available at github.com/EavnJeong/Internal-Flow-Signatures-for-Self-Checking-and-Refinement-in-LLMs.

大型语言模型的自检与优化中的内部流特征

Internal Flow Signatures for Self-Checking and Refinement in LLMs

摘要

Support