ACE: 다중 홉 사실 회상을 위한 속성 제어 지식 편집

초록

대규모 언어 모델(LLMs)은 사실 정보를 업데이트하기 위해 효율적인 지식 편집(KE)이 필요하지만, 기존 방법들은 다중 홉 사실 회상에서 상당한 성능 저하를 보입니다. 이러한 실패는 특히 추론 체인 내 중간 암묵적 주체와 관련된 편집에서 더욱 두드러집니다. 인과 분석을 통해, 우리는 이러한 한계가 체인화된 지식이 뉴런 수준에서 어떻게 동적으로 표현되고 활용되는지에 대한 간과에서 비롯됨을 밝혔습니다. 우리는 다중 홉 추론 과정에서 암묵적 주체들이 쿼리 뉴런으로 기능하며, 이들이 트랜스포머 레이어를 가로질러 해당 값 뉴런들을 순차적으로 활성화시켜 최종 답변을 향해 정보를 축적한다는 것을 발견했습니다. 이는 기존 KE 연구가 간과한 동적 특성입니다. 이러한 통찰을 바탕으로, 우리는 ACE: 다중 홉 사실 회상을 위한 속성 제어 지식 편징(Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall)이라는 프레임워크를 제안합니다. ACE는 뉴런 수준의 속성을 활용하여 이러한 중요한 쿼리-값(Q-V) 경로를 식별하고 편집합니다. ACE는 다중 홉 KE를 위한 기계론적으로 근거된 솔루션을 제공하며, GPT-J에서는 9.44%, Qwen3-8B에서는 37.46%로 최신 방법들을 경험적으로 능가합니다. 우리의 분석은 Qwen3에서 더 세분화된 활성화 패턴을 밝혀내고, 값 뉴런들의 의미론적 해석 가능성이 쿼리 주도적 축적에 의해 조율된다는 것을 보여줍니다. 이러한 발견들은 내부 추론 메커니즘에 대한 원칙적 이해를 바탕으로 KE 능력을 발전시키는 새로운 경로를 제시합니다.

English

Large Language Models (LLMs) require efficient knowledge editing (KE) to update factual information, yet existing methods exhibit significant performance decay in multi-hop factual recall. This failure is particularly acute when edits involve intermediate implicit subjects within reasoning chains. Through causal analysis, we reveal that this limitation stems from an oversight of how chained knowledge is dynamically represented and utilized at the neuron level. We discover that during multi hop reasoning, implicit subjects function as query neurons, which sequentially activate corresponding value neurons across transformer layers to accumulate information toward the final answer, a dynamic prior KE work has overlooked. Guided by this insight, we propose ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall, a framework that leverages neuron-level attribution to identify and edit these critical query-value (Q-V) pathways. ACE provides a mechanistically grounded solution for multi-hop KE, empirically outperforming state-of-the-art methods by 9.44% on GPT-J and 37.46% on Qwen3-8B. Our analysis further reveals more fine-grained activation patterns in Qwen3 and demonstrates that the semantic interpretability of value neurons is orchestrated by query-driven accumulation. These findings establish a new pathway for advancing KE capabilities based on the principled understanding of internal reasoning mechanisms.