어텐션 메커니즘 관점: 그래프 구조 데이터의 LLM 처리 탐구

초록

어텐션 메커니즘은 대규모 언어 모델(LLMs)의 성공에 있어 핵심적인 역할을 하며, 다양한 분야에서 중요한 발전을 이끌어왔습니다. 그러나 위상 연결에 중점을 두어야 하는 그래프 구조 데이터의 경우, 그래프 신경망(GNNs)과 같은 고정된 연결을 사용하는 메시지 전달 메커니즘에 비해 어텐션 메커니즘이 부족한 성능을 보입니다. 이는 "자연어 환경에서 그래프에 대해 어텐션 메커니즘이 실패하는가?"라는 질문을 제기합니다. 이러한 관찰에 동기를 받아, 우리는 LLMs가 그래프 구조 데이터를 어떻게 처리하는지 탐구하기 위해 어텐션 메커니즘의 관점에서 실증적 연구를 시작했습니다. 이 연구의 목표는 그래프 구조에 대한 LLMs의 어텐션 행동에 대한 깊은 통찰을 얻는 것입니다. 우리는 LLMs가 그래프 구조 데이터에 어텐션을 적용하는 방식에 대한 독특한 현상을 발견하고, 이러한 발견을 분석하여 LLMs가 이러한 데이터를 모델링하는 방법을 개선하고자 했습니다. 우리 연구의 주요 결과는 다음과 같습니다: 1) LLMs는 그래프 데이터를 인식하고 텍스트-노드 상호작용을 포착할 수 있지만, 내재된 아키텍처적 제약으로 인해 그래프 구조 내 노드 간 관계를 모델링하는 데 어려움을 겪습니다. 2) LLMs의 그래프 노드 간 어텐션 분포는 이상적인 구조 패턴과 일치하지 않으며, 이는 그래프 토폴로지의 미묘한 차이에 적응하지 못함을 나타냅니다. 3) 완전 연결 어텐션과 고정 연결 모두 최적이 아니며, 각각 특정 응용 시나리오에서 한계를 보입니다. 대신, 중간 상태 어텐션 윈도우는 LLM 훈련 성능을 향상시키고, 추론 중에 완전 연결 윈도우로 원활하게 전환됩니다. 소스 코드: https://github.com/millioniron/LLM_exploration{LLM4Exploration}

English

Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on topological connections, they fall short compared to message-passing mechanisms on fixed links, such as those employed by Graph Neural Networks (GNNs). This raises a question: ``Does attention fail for graphs in natural language settings?'' Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. The goal is to gain deeper insights into the attention behavior of LLMs over graph structures. We uncovered unique phenomena regarding how LLMs apply attention to graph-structured data and analyzed these findings to improve the modeling of such data by LLMs. The primary findings of our research are: 1) While LLMs can recognize graph data and capture text-node interactions, they struggle to model inter-node relationships within graph structures due to inherent architectural constraints. 2) The attention distribution of LLMs across graph nodes does not align with ideal structural patterns, indicating a failure to adapt to graph topology nuances. 3) Neither fully connected attention nor fixed connectivity is optimal; each has specific limitations in its application scenarios. Instead, intermediate-state attention windows improve LLM training performance and seamlessly transition to fully connected windows during inference. Source code: https://github.com/millioniron/LLM_exploration{LLM4Exploration}

어텐션 메커니즘 관점: 그래프 구조 데이터의 LLM 처리 탐구

Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data

초록

Support