LoRA Land：310 优化的LLM，可与GPT-4媲美，技术报告

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

April 29, 2024

作者: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi

cs.AI

摘要

低秩适应（LoRA）已成为大型语言模型（LLMs）参数高效微调（PEFT）的最广泛采用方法之一。LoRA减少了可训练参数的数量和内存使用，同时实现了与完全微调相当的性能。我们旨在评估在现实世界应用中训练和提供使用LoRA进行微调的LLMs的可行性。首先，我们测量了通过量化低秩适配器对10个基础模型和31个任务进行微调的LLMs的质量，共计310个模型。我们发现，4位LoRA微调模型的平均表现优于基础模型34个点，优于GPT-4 10个点。其次，我们调查了用于微调的最有效的基础模型，并评估了任务复杂性启发式的相关和预测能力，以预测微调的结果。最后，我们评估了LoRAX的延迟和并发能力，LoRAX是一个开源的多LoRA推断服务器，可利用共享的基础模型权重和动态适配器加载，在单个GPU上部署多个LoRA微调模型。LoRAX支持LoRA Land，这是一个网络应用，使用单个NVIDIA A100 GPU和80GB内存托管25个LoRA微调的Mistral-7B LLMs。LoRA Land突出了使用多个专门的LLMs而不是单个通用LLM的质量和成本效益。

English

Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and 31 tasks for a total of 310 models. We find that 4-bit LoRA fine-tuned models outperform base models by 34 points and GPT-4 by 10 points on average. Second, we investigate the most effective base models for fine-tuning and assess the correlative and predictive capacities of task complexity heuristics in forecasting the outcomes of fine-tuning. Finally, we evaluate the latency and concurrency capabilities of LoRAX, an open-source Multi-LoRA inference server that facilitates the deployment of multiple LoRA fine-tuned models on a single GPU using shared base model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory. LoRA Land highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.