LoRA Land:310 經過微調的LLM,可與GPT-4 媲美,技術報告
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
April 29, 2024
作者: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi
cs.AI
摘要
低秩適應(LoRA)已成為最廣泛採用的參數高效微調(PEFT)大型語言模型(LLMs)方法之一。LoRA減少了可訓練參數和記憶體使用量,同時實現了與完全微調相當的性能。我們旨在評估在真實應用中訓練和提供LoRA微調的LLMs的可行性。首先,我們測量了通過量化低秩適配器對10個基本模型和31個任務進行微調的LLMs的質量,總共310個模型。我們發現,4位元的LoRA微調模型平均比基本模型高出34個點,比GPT-4高出10個點。其次,我們調查了進行微調的最有效基本模型,並評估了任務複雜度啟發式在預測微調結果方面的相關性和預測能力。最後,我們評估了LoRAX的延遲和並行能力,LoRAX是一個開源的多LoRA推理伺服器,可使用共享的基本模型權重和動態適配器加載,在單個GPU上部署多個LoRA微調模型。LoRAX支持LoRA Land,一個Web應用程序,在一個NVIDIA A100 GPU上托管了25個LoRA微調的Mistral-7B LLMs,具有80GB的記憶體。LoRA Land突顯了使用多個專門的LLMs相對於單一通用LLM的質量和成本效益。
English
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted
methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models
(LLMs). LoRA reduces the number of trainable parameters and memory usage while
achieving comparable performance to full fine-tuning. We aim to assess the
viability of training and serving LLMs fine-tuned with LoRA in real-world
applications. First, we measure the quality of LLMs fine-tuned with quantized
low rank adapters across 10 base models and 31 tasks for a total of 310 models.
We find that 4-bit LoRA fine-tuned models outperform base models by 34 points
and GPT-4 by 10 points on average. Second, we investigate the most effective
base models for fine-tuning and assess the correlative and predictive
capacities of task complexity heuristics in forecasting the outcomes of
fine-tuning. Finally, we evaluate the latency and concurrency capabilities of
LoRAX, an open-source Multi-LoRA inference server that facilitates the
deployment of multiple LoRA fine-tuned models on a single GPU using shared base
model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web
application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA
A100 GPU with 80GB memory. LoRA Land highlights the quality and
cost-effectiveness of employing multiple specialized LLMs over a single,
general-purpose LLM.Summary
AI-Generated Summary