LoRAShear: 効率的な大規模言語モデルの構造化プルーニングと知識回復

要旨

大規模言語モデル（LLMs）は人工知能の領域を変革してきたが、その巨大なサイズは計算コストの面で重大な課題を提示している。本論文では、LLMsを構造的に枝刈りし、知識を回復するための新しい効率的なアプローチであるLoRAShearを紹介する。一般的なLLMsを対象として、LoRAShearはまず依存関係グラフを作成し、最小限の除去構造を発見し、知識分布を分析する。その後、LoRAアダプターに対して段階的な構造的枝刈りを進め、冗長構造内の情報をより良く保存するために固有の知識転移を可能にする。枝刈り中に失われた知識を回復するために、LoRAShearは綿密に研究を行い、動的データアダプターを用いた動的ファインチューニングスキームを提案し、完全モデルとの性能差を効果的に縮小する。数値結果は、わずか1台のGPUを数日間使用するだけで、LoRAShearがLLMsのフットプリントを20%削減し、性能劣化を1.0%に抑え、最先端技術を大幅に上回ることを示している。ソースコードはhttps://github.com/microsoft/lorashearで公開予定である。

English

Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear first creates the dependency graphs to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.

LoRAShear: 効率的な大規模言語モデルの構造化プルーニングと知識回復

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

要旨

Support