当图标记沉没：图语言模型的机理分析

摘要

图语言模型（GLMs）已成为将大语言模型（LLMs）适配至图学习任务的一条有前景的路径。通过将图拓扑结构与节点信息转化为图标记，GLMs使得大语言模型能够联合处理结构化图输入与文本指令。然而，大语言模型内部如何解释这些图标记，以及图标记是否充当图结构的有意义载体，仍不明确。在本工作中，我们通过分析代表性GLM架构中图标记的行为，探究了大语言模型处理图信息的方式。 **发现。** 我们发现，GLM中图标记的内部显著性并不等同于图信息的利用程度。图汇标记始终表现为激活层级异常值：它们可通过少量隐藏状态维度上的巨大激活值识别，且偏向于较早的图标记位置。然而，这种激活层级的显著性并不意味着这些标记是图信息的主要载体。与语言模型及视觉-语言模型中的经典注意力汇聚点不同，图汇标记并不必然吸引查询标记的最大注意力权重。通过剪枝、重新定位和交换干预，我们证明图汇标记并非对下游预测最重要的语义或结构标记。 **影响。** 综上，这些结果表明，当前GLM将图结构映射至大语言模型标记空间后，所生成的图标记表示并未自然形成完全可用的拓扑感知内部表示；相反，它们展现出激活层级显著性与图语义效用之间的解耦。这种解耦揭示了现有图标记构造、放置及对齐机制的局限性。

English

Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to graph learning tasks. By transforming graph topology and node information into graph tokens, GLMs allow LLMs to jointly process structured graph inputs and textual instructions. Yet, it remains unclear how LLMs internally interpret these graph tokens and whether graph tokens act as meaningful carriers of graph structure. In this work, we analyze how LLMs process graph information through graph-token behavior in representative GLM architectures. Findings. We find that the internal saliency of graph tokens in GLMs is not equivalent to graph information utilization. Graph sink tokens consistently emerge as activation-level outliers: they can be identified by massive activation values along a small set of hidden-state dimensions and are biased toward early graph-token positions. However, this activation-level saliency does not imply that these tokens are the main carriers of graph information. Unlike classical attention sinks in language and vision-language models, graph sink tokens do not necessarily attract the largest attention weights from query tokens. Through pruning, repositioning, and swapping interventions, we show that graph sink tokens are not the most important semantic or structural tokens for downstream prediction. Implications. Together, these results suggest that after current GLMs map graph structure into the LLM token space, the resulting graph-token representations do not naturally form a fully usable topology-aware internal representation; instead, they exhibit a decoupling between activation-level saliency and graph-semantic utility. This decoupling points to limitations in existing graph-token construction, placement, and alignment mechanisms.