基于秩二子空间解缠的多步知识交互分析

摘要

自然语言解释（NLE）通过结合外部语境知识（CK）与存储在模型权重中的参数知识（PK），描述大语言模型（LLM）的决策机制。理解二者的相互作用是评估NLE基础的关键，但目前研究仍不充分。已有研究大多仅关注单步生成（通常是最终答案），并将PK与CK的交互建模为秩-1子空间中的二元选择，忽略了互补性、支持性等更丰富的交互形式。我们提出一种新颖的秩-2投影子空间，能更精准解构PK与CK的贡献度，并首次实现长序列NLE中知识交互的多步分析。在四个问答数据集和三个开源指令微调LLM上的实验表明：秩-1子空间难以有效表征多样化的知识交互，而我们的秩-2模型能精准捕捉这些特征。多步分析揭示：幻觉性NLE明显偏向PK方向，语境忠实型NLE平衡PK与CK，而针对NLE的思维链提示会通过降低PK依赖使生成结果向CK偏移。本研究首次通过更丰富的秩-2解构框架，为系统研究LLM多步知识交互提供了方法论基础。代码与数据详见：https://github.com/copenlu/pk-ck-knowledge-disentanglement。

English

Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model weights. Understanding their interaction is key to assessing the grounding of NLEs, yet it remains underexplored. Prior work has largely examined only single-step generation, typically the final answer, and has modelled PK and CK interaction only as a binary choice in a rank-1 subspace. This overlooks richer forms of interaction, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments on four QA datasets and three open-weight instruction-tuned LLMs show that diverse knowledge interactions are poorly represented in a rank-1 subspace but are effectively captured in our rank-2 formulation. Our multi-step analysis reveals that hallucinated NLEs align strongly with the PK direction, context-faithful ones balance PK and CK, and Chain-of-Thought prompting for NLEs shifts generated NLEs toward CK by reducing PK reliance. This work provides the first framework for systematic studies of multi-step knowledge interactions in LLMs through a richer rank-2 subspace disentanglement. Code and data: https://github.com/copenlu/pk-ck-knowledge-disentanglement.

基于秩二子空间解缠的多步知识交互分析

Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

摘要

Support