LoRA Land:310 优化的LLM,可与GPT-4媲美,技术报告
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
April 29, 2024
作者: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi
cs.AI
摘要
低秩适应(LoRA)已成为大型语言模型(LLMs)参数高效微调(PEFT)的最广泛采用方法之一。LoRA减少了可训练参数的数量和内存使用,同时实现了与完全微调相当的性能。我们旨在评估在现实世界应用中训练和提供使用LoRA进行微调的LLMs的可行性。首先,我们测量了通过量化低秩适配器对10个基础模型和31个任务进行微调的LLMs的质量,共计310个模型。我们发现,4位LoRA微调模型的平均表现优于基础模型34个点,优于GPT-4 10个点。其次,我们调查了用于微调的最有效的基础模型,并评估了任务复杂性启发式的相关和预测能力,以预测微调的结果。最后,我们评估了LoRAX的延迟和并发能力,LoRAX是一个开源的多LoRA推断服务器,可利用共享的基础模型权重和动态适配器加载,在单个GPU上部署多个LoRA微调模型。LoRAX支持LoRA Land,这是一个网络应用,使用单个NVIDIA A100 GPU和80GB内存托管25个LoRA微调的Mistral-7B LLMs。LoRA Land突出了使用多个专门的LLMs而不是单个通用LLM的质量和成本效益。
English
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted
methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models
(LLMs). LoRA reduces the number of trainable parameters and memory usage while
achieving comparable performance to full fine-tuning. We aim to assess the
viability of training and serving LLMs fine-tuned with LoRA in real-world
applications. First, we measure the quality of LLMs fine-tuned with quantized
low rank adapters across 10 base models and 31 tasks for a total of 310 models.
We find that 4-bit LoRA fine-tuned models outperform base models by 34 points
and GPT-4 by 10 points on average. Second, we investigate the most effective
base models for fine-tuning and assess the correlative and predictive
capacities of task complexity heuristics in forecasting the outcomes of
fine-tuning. Finally, we evaluate the latency and concurrency capabilities of
LoRAX, an open-source Multi-LoRA inference server that facilitates the
deployment of multiple LoRA fine-tuned models on a single GPU using shared base
model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web
application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA
A100 GPU with 80GB memory. LoRA Land highlights the quality and
cost-effectiveness of employing multiple specialized LLMs over a single,
general-purpose LLM.Summary
AI-Generated Summary