低秩適配器遇上神經架構搜索，用於LLM壓縮

摘要

大型語言模型（LLMs）的快速擴展對於微調和部署所需的計算資源提出了重大挑戰。低秩適配器的最新進展展示了它們在這些模型的參數高效微調（PEFT）中的功效。本回顧性論文全面討論了將低秩表示與神經架構搜索（NAS）技術相結合的創新方法，特別是權重共享的超網絡。通過整合這些方法，開發了壓縮和微調大型預訓練模型的強大解決方案。我們的分析突顯了這些結合策略在民主化使用LLMs方面的潛力，使它們更容易部署在資源受限環境中。產生的模型展現了較小的記憶體占用和更快的推論時間，為LLMs的更實用和可擴展應用鋪平了道路。模型和程式碼可在以下網址找到：https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning。

English

The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

低秩適配器遇上神經架構搜索，用於LLM壓縮

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

摘要

Support