基于秩-2子空间解耦的多步骤知识交互分析

摘要

自然語言解釋（NLEs）通過結合外部情境知識（CK）與存儲於模型權重中的參數知識（PK），描述大型語言模型（LLMs）的決策機制。理解二者互動關係是評估NLEs基礎可靠性的關鍵，但這一領域仍待深入探索。現有研究大多僅關注單步生成（通常為最終答案），並將PK與CK的互動建模為秩-1子空間中的二元選擇，忽略了互補性、支持性等更豐富的互動形式。我們提出新穎的秩-2投影子空間，能更精準分離PK與CK的貢獻度，並首次實現對長序列NLE中知識互動的多步分析。在四個問答數據集和三種開源權重的指令微調LLMs上的實驗表明：秩-1子空間難以有效表徵多樣化知識互動，而我們的秩-2模型能精確捕捉此類互動。多步分析揭示：虛構型NLE明顯偏向PK維度，情境忠實型NLE平衡PK與CK，而針對NLE的思維鏈提示可通過降低PK依賴使生成結果向CK維度偏移。本研究首創通過更豐富的秩-2子空間分離框架，為系統性探索LLMs中多步知識互動奠定基礎。代碼與數據：https://github.com/copenlu/pk-ck-knowledge-disentanglement。

English

Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model weights. Understanding their interaction is key to assessing the grounding of NLEs, yet it remains underexplored. Prior work has largely examined only single-step generation, typically the final answer, and has modelled PK and CK interaction only as a binary choice in a rank-1 subspace. This overlooks richer forms of interaction, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments on four QA datasets and three open-weight instruction-tuned LLMs show that diverse knowledge interactions are poorly represented in a rank-1 subspace but are effectively captured in our rank-2 formulation. Our multi-step analysis reveals that hallucinated NLEs align strongly with the PK direction, context-faithful ones balance PK and CK, and Chain-of-Thought prompting for NLEs shifts generated NLEs toward CK by reducing PK reliance. This work provides the first framework for systematic studies of multi-step knowledge interactions in LLMs through a richer rank-2 subspace disentanglement. Code and data: https://github.com/copenlu/pk-ck-knowledge-disentanglement.

基于秩-2子空间解耦的多步骤知识交互分析

Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

摘要

Support