低秩適配器遇上神經架構搜索,用於LLM壓縮
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
January 23, 2025
作者: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
cs.AI
摘要
大型語言模型(LLMs)的快速擴展對於微調和部署所需的計算資源提出了重大挑戰。低秩適配器的最新進展展示了它們在這些模型的參數高效微調(PEFT)中的功效。本回顧性論文全面討論了將低秩表示與神經架構搜索(NAS)技術相結合的創新方法,特別是權重共享的超網絡。通過整合這些方法,開發了壓縮和微調大型預訓練模型的強大解決方案。我們的分析突顯了這些結合策略在民主化使用LLMs方面的潛力,使它們更容易部署在資源受限環境中。產生的模型展現了較小的記憶體占用和更快的推論時間,為LLMs的更實用和可擴展應用鋪平了道路。模型和程式碼可在以下網址找到:https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning。
English
The rapid expansion of Large Language Models (LLMs) has posed significant
challenges regarding the computational resources required for fine-tuning and
deployment. Recent advancements in low-rank adapters have demonstrated their
efficacy in parameter-efficient fine-tuning (PEFT) of these models. This
retrospective paper comprehensively discusses innovative approaches that
synergize low-rank representations with Neural Architecture Search (NAS)
techniques, particularly weight-sharing super-networks. Robust solutions for
compressing and fine-tuning large pre-trained models are developed by
integrating these methodologies. Our analysis highlights the potential of these
combined strategies to democratize the use of LLMs, making them more accessible
for deployment in resource-constrained environments. The resulting models
exhibit reduced memory footprints and faster inference times, paving the way
for more practical and scalable applications of LLMs. Models and code are
available at
https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.Summary
AI-Generated Summary