LoRA Land:310个媲美GPT-4的微调大语言模型技术报告
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
April 29, 2024
作者: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi
cs.AI
摘要
低秩自适应(LoRA)已成为大语言模型(LLM)参数高效微调(PEFT)中最广泛采用的方法之一。该方法在保持与全参数微调相当性能的同时,显著减少了可训练参数数量和内存占用。本研究旨在评估基于LoRA微调的LLM在实际应用中的训练与服务可行性。首先,我们在10个基础模型和31项任务上对量化低秩适配器微调的LLM进行质量评估(共计310个模型)。实验表明,4比特LoRA微调模型相较基础模型平均提升34个点,较GPT-4平均领先10个点。其次,我们探究了最适合微调的基础模型类型,并评估了任务复杂度启发式方法在预测微调结果时的关联性与预测能力。最后,我们测试了开源多LoRA推理服务器LoRAX的延迟与并发性能——该系统通过共享基础模型权重与动态适配器加载技术,实现在单GPU上部署多个LoRA微调模型。LoRAX支撑着LoRA Land应用平台,该平台在单个80GB显存的NVIDIA A100 GPU上同时托管25个基于Mistral-7B的LoRA微调模型。LoRA Land的实践印证了采用多个专用LLM相较于单一通用LLM在质量与成本效益上的双重优势。
English
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted
methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models
(LLMs). LoRA reduces the number of trainable parameters and memory usage while
achieving comparable performance to full fine-tuning. We aim to assess the
viability of training and serving LLMs fine-tuned with LoRA in real-world
applications. First, we measure the quality of LLMs fine-tuned with quantized
low rank adapters across 10 base models and 31 tasks for a total of 310 models.
We find that 4-bit LoRA fine-tuned models outperform base models by 34 points
and GPT-4 by 10 points on average. Second, we investigate the most effective
base models for fine-tuning and assess the correlative and predictive
capacities of task complexity heuristics in forecasting the outcomes of
fine-tuning. Finally, we evaluate the latency and concurrency capabilities of
LoRAX, an open-source Multi-LoRA inference server that facilitates the
deployment of multiple LoRA fine-tuned models on a single GPU using shared base
model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web
application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA
A100 GPU with 80GB memory. LoRA Land highlights the quality and
cost-effectiveness of employing multiple specialized LLMs over a single,
general-purpose LLM.