低ランクアダプターがLLM圧縮のためのニューラルアーキテクチャサーチに遭遇する

要旨

大規模言語モデル（LLMs）の急速な拡大は、ファインチューニングや展開に必要な計算リソースに関する重要な課題を提起しています。低ランクアダプターの最近の進歩は、これらのモデルのパラメータ効率的なファインチューニング（PEFT）での効果を実証しています。この回顧的論文では、低ランク表現とニューラルアーキテクチャサーチ（NAS）技術、特にウェイト共有スーパーネットワークとのシナジー効果を包括的に検討しています。これらの手法を統合することで、大規模事前学習モデルの圧縮とファインチューニングのための堅牢なソリューションが開発されています。当社の分析は、これらの組み合わせ戦略のLLMsの使用を民主化し、リソース制約のある環境での展開をよりアクセスしやすくする潜在能力を強調しています。結果として得られるモデルは、メモリフットプリントが削減され、推論時間が短縮されており、LLMsのより実用的でスケーラブルなアプリケーションへの道を開いています。モデルとコードは以下で入手可能です：https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

English

The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

低ランクアダプターがLLM圧縮のためのニューラルアーキテクチャサーチに遭遇する

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

要旨

Support