命令特異的ニューロンとエキスパートの解明：大規模言語モデルの命令追従能力のための分析フレームワーク

要旨

大規模言語モデル（LLMs）のファインチューニングは、指示追従能力を大幅に向上させてきたが、これらの改善を駆動する基盤となる計算メカニズムは未だ十分に理解されていない。本研究では、ファインチューニングがLLMの計算をどのように再構成するかを体系的に検証するため、指示特異的なスパース成分、すなわち密なモデルにおけるニューロンおよびMixture-of-Experts（MoE）アーキテクチャにおけるニューロンとエキスパートを分離・分析する。特に、6つの異なるカテゴリーにわたる慎重に選定・バランス調整された指示データセットであるHexaInstを導入し、SPARCOMという新しい分析フレームワークを提案する。SPARCOMは、(1) これらのスパース成分を特定する手法、(2) それらの機能的一般性と独自性の評価、(3) それらの変化の体系的比較という3つの主要な貢献から構成される。実験を通じて、これらの成分の機能的一般性、独自性、および指示実行における重要な役割を実証する。ファインチューニングによる適応とスパースな計算基盤との関係を解明することで、本論文はLLMが指示追従行動を内部化する仕組みについてより深い洞察を提供し、信頼できるLLMコミュニティに貢献する。

English

The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.

命令特異的ニューロンとエキスパートの解明：大規模言語モデルの命令追従能力のための分析フレームワーク

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

要旨

Support