명령어 특정 뉴런 및 전문가의 발견: 대형 언어 모델의 명령어 수행 능력에 대한 분석적 프레임워크

초록

대규모 언어 모델(LLMs)의 미세 조정(finetuning)은 명령 수행 능력을 크게 향상시켰으나, 이러한 개선을 이끄는 근본적인 계산 메커니즘은 여전히 잘 이해되지 않고 있다. 본 연구는 미세 조정이 LLM의 계산 구조를 어떻게 재구성하는지를 체계적으로 조사하기 위해, 명령 특이적 희소 구성 요소(sparse components), 즉 밀집 모델(dense models)의 뉴런과 전문가 혼합(Mixture-of-Experts, MoE) 아키텍처의 뉴런 및 전문가를 분리하고 분석한다. 특히, 우리는 6개의 구별되는 범주를 아우르는 신중하게 선별되고 균형 잡힌 명령 데이터셋인 HexaInst를 소개하고, SPARCOM이라는 새로운 분석 프레임워크를 제안한다. 이 프레임워크는 (1) 이러한 희소 구성 요소를 식별하는 방법, (2) 이들의 기능적 일반성과 독창성을 평가하는 방법, (3) 이들의 변화를 체계적으로 비교하는 방법이라는 세 가지 주요 기여를 포함한다. 실험을 통해 우리는 이러한 구성 요소의 기능적 일반성, 독창성, 그리고 명령 실행에서의 중요한 역할을 입증한다. 미세 조정에 의해 유도된 적응과 희소 계산 기질 간의 관계를 명확히 함으로써, 이 연구는 LLM이 명령 수행 행동을 내재화하는 방식에 대한 더 깊은 통찰을 제공하여 신뢰할 수 있는 LLM 커뮤니티에 기여한다.

English

The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.