GPTailor: レイヤーカットとステッチングによる大規模言語モデルのプルーニング

要旨

大規模言語モデル（LLMs）は、言語理解と生成において顕著な能力を示している。しかし、そのような印象的な能力は通常、モデルサイズの大幅な増大を伴い、展開と推論において重大な課題を提示する。モデルパラメータの構造化プルーニングは、展開時の計算コストを削減する有望な方法を提供するが、現在の手法は主に単一モデルのプルーニングに焦点を当てている。本研究では、ファインチューニングされたモデルバリアントから層を戦略的に結合または統合することにより、モデルを圧縮する新たな戦略を開発する。これにより、異なるファインチューンで強調された能力を集約することで、元のモデルの能力を維持する。これらのLLMsの最適な調整をゼロ次最適化問題として定式化し、3つの異なる操作をサポートする探索空間を採用する：（1）層の削除、（2）異なる候補モデルからの層の選択、（3）層の統合。実験結果は、このアプローチが競争力のあるモデルプルーニングをもたらすことを示しており、例えば、Llama2-13Bモデルファミリーにおいて、圧縮されたモデルは元の性能の約97.3％を維持しながら、約25％のパラメータを削除し、従来の最先端手法を大幅に上回る。コードはhttps://github.com/Guinan-Su/auto-merge-llmで公開されている。

English

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in deployment and inference. While structured pruning of model parameters offers a promising way to reduce computational costs at deployment time, current methods primarily focus on single model pruning. In this work, we develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants, which preserves the original model's abilities by aggregating capabilities accentuated in different finetunes. We pose the optimal tailoring of these LLMs as a zero-order optimization problem, adopting a search space that supports three different operations: (1) Layer removal, (2) Layer selection from different candidate models, and (3) Layer merging. Our experiments demonstrate that this approach leads to competitive model pruning, for example, for the Llama2-13B model families, our compressed models maintain approximately 97.3\% of the original performance while removing sim25% of parameters, significantly outperforming previous state-of-the-art methods. The code is available at https://github.com/Guinan-Su/auto-merge-llm.

GPTailor: レイヤーカットとステッチングによる大規模言語モデルのプルーニング

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

要旨

Support