専門家はその専門に専念せよ：疎構造大規模言語モデルのための専門家特化ファインチューニング

要旨

パラメータ効率的なファインチューニング（PEFT）は、リソースが制約された状況で大規模言語モデル（LLM）をカスタマイズするために重要である。密なアーキテクチャを持つLLM向けのPEFT手法は数多く存在するが、疎なアーキテクチャを持つLLM向けのPEFTはまだ十分に研究されていない。本研究では、Mixture-of-Experts（MoE）アーキテクチャを持つLLM向けのPEFT手法を検討し、その内容は主に以下の3点にまとめられる：（1）カスタマイズされたタスクにおいて活性化されるエキスパートの分散度を調査し、特定のタスクに対するルーティング分布が高度に集中する傾向がある一方で、活性化されるエキスパートの分布はタスク間で大きく異なることを明らかにした。（2）下流タスクに最も関連するエキスパートをチューニングし、他のエキスパートやモジュールを凍結する「エキスパート特化型ファインチューニング」（ESFT）を提案した。実験結果から、本手法がチューニング効率を向上させるだけでなく、全パラメータのファインチューニングと同等またはそれ以上の性能を発揮することが示された。（3）さらに、MoEアーキテクチャがエキスパート特化型ファインチューニングに与える影響を分析した。より細かい粒度のエキスパートを持つMoEモデルは、下流タスクに最も関連するエキスパートの組み合わせを選択する上で有利であり、それによってトレーニング効率と効果の両方が向上することがわかった。

English

Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves the tuning efficiency, but also matches or even surpasses the performance of full-parameter fine-tuning. (3) We further analyze the impact of the MoE architecture on expert-specialized fine-tuning. We find that MoE models with finer-grained experts are more advantageous in selecting the combination of experts that are most relevant to downstream tasks, thereby enhancing both the training efficiency and effectiveness.

専門家はその専門に専念せよ：疎構造大規模言語モデルのための専門家特化ファインチューニング

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

要旨

Support