注意力机制视角：探索大语言模型对图结构数据的处理

摘要

注意力机制对于大型语言模型（LLMs）的成功至关重要，推动了多个领域的显著进步。然而，在处理需要强调拓扑连接的图结构数据时，与基于固定链接的消息传递机制（如图神经网络GNNs所采用的方法）相比，注意力机制显得力不从心。这引发了一个问题：“在自然语言环境中，注意力机制是否不适用于图数据？”基于这些观察，我们从注意力机制的角度出发，开展了一项实证研究，以探索LLMs如何处理图结构数据。目的是更深入地理解LLMs在图结构上的注意力行为。我们揭示了LLMs如何将注意力应用于图结构数据的独特现象，并分析了这些发现，以改进LLMs对此类数据的建模能力。我们的研究主要发现如下：1) 尽管LLMs能够识别图数据并捕捉文本与节点间的交互，但由于其固有的架构限制，它们在建模图结构内部节点间关系方面存在困难。2) LLMs在图节点间的注意力分布与理想的结构模式不符，表明其未能适应图拓扑的细微差别。3) 完全连接的注意力机制与固定连接均非最优选择，每种方法在特定应用场景下都有其局限性。相反，中间状态的注意力窗口提升了LLM的训练性能，并在推理时无缝过渡到完全连接窗口。源代码：https://github.com/millioniron/LLM_exploration{LLM4Exploration}

English

Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on topological connections, they fall short compared to message-passing mechanisms on fixed links, such as those employed by Graph Neural Networks (GNNs). This raises a question: ``Does attention fail for graphs in natural language settings?'' Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. The goal is to gain deeper insights into the attention behavior of LLMs over graph structures. We uncovered unique phenomena regarding how LLMs apply attention to graph-structured data and analyzed these findings to improve the modeling of such data by LLMs. The primary findings of our research are: 1) While LLMs can recognize graph data and capture text-node interactions, they struggle to model inter-node relationships within graph structures due to inherent architectural constraints. 2) The attention distribution of LLMs across graph nodes does not align with ideal structural patterns, indicating a failure to adapt to graph topology nuances. 3) Neither fully connected attention nor fixed connectivity is optimal; each has specific limitations in its application scenarios. Instead, intermediate-state attention windows improve LLM training performance and seamlessly transition to fully connected windows during inference. Source code: https://github.com/millioniron/LLM_exploration{LLM4Exploration}

注意力机制视角：探索大语言模型对图结构数据的处理

Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data

摘要

Support