揭示指令特定神经元与专家：大语言模型指令遵循能力的分析框架

摘要

大型语言模型（LLMs）的微调显著提升了其指令遵循能力，然而推动这些改进的底层计算机制仍鲜为人知。本研究通过分离并分析指令特定的稀疏组件——即密集模型中的神经元以及混合专家（MoE）架构中的神经元与专家——系统地探讨了微调如何重新配置LLM的计算。特别地，我们引入了HexaInst，一个精心策划且平衡的指令数据集，涵盖六个不同类别，并提出了SPARCOM这一新颖的分析框架，该框架包含三项关键贡献：（1）识别这些稀疏组件的方法，（2）评估其功能通用性与独特性，以及（3）系统比较其变化。通过实验，我们展示了这些组件在指令执行中的功能通用性、独特性及其关键作用。通过阐明微调引发的适应与稀疏计算基础之间的关系，本研究为可信赖的LLM社区深入理解LLMs如何内化指令遵循行为提供了更深刻的洞见。

English

The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.

揭示指令特定神经元与专家：大语言模型指令遵循能力的分析框架

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

摘要

Support